Files
agent-framework/python/samples
T
Ben Thomas e0d0ad16a0 Python: feat(evals): Foundry Adaptive Evals integration (rubric-generation) (#6101)
* Python: feat(evals): RubricScore type + EvalScoreResult.dimensions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): RubricDimension + GeneratedEvaluatorRef + accept in evaluators=

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(evals): parse rubric_scores from output items + assertion helpers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(evals): BaseAgent.as_eval_source / Workflow.as_eval_source

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): EvalGenerationSource + generate_rubric helper

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): YAML config loader + sample

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix(evals): address PR review feedback

Addresses 4 Copilot review comments on PR #6101:

1. assert_dimension_score_at_least: drop the (not evaluator or found_any) guard so require_applicable=True correctly raises when the named evaluator produces no entries for the dimension. Adds TestRubricAssertions covering the regression.

2. GeneratedEvaluatorRef docstring: reword to describe actual behaviour (pinning recommended, not required) so it matches the dataclass default and FoundryEvals warning path.

3. _poll_generation_job: switch from asyncio.get_event_loop() to get_running_loop() and bound the per-iteration sleep by remaining time, matching _poll_eval_run.

4. generate_rubric: type category as Literal['quality','safety'] and validate at the entry point with a ValueError; drop the silent 'invalid -> quality' rewrite in _generation_job_to_ref. Adds a regression test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): hosted-agent-aware rubric generation

* Auto-detect hosted Foundry agents in agent_as_eval_source: when the
  agent's chat_client exposes a string agent_name (the convention used
  by RawFoundryAgentChatClient for PromptAgents/HostedAgents), emit a
  type='agent' EvalGenerationSource so the service fetches instructions
  and tools from the agent registry instead of relying on the local
  wrapper (which holds neither for hosted agents).
* Add hosted_agent_version kwarg and a new agent_version field on
  EvalGenerationSource so PromptAgent runs can pin to a specific hosted
  version for reproducible rubric generation.
* Add force_prompt_source escape hatch to bypass auto-detection and
  always emit a rendered prompt dossier - useful when the local wrapper
  carries overrides the service-side agent doesnt see.
* Fix _to_sdk_source for dataset sources: SDK ctor takes name=/version=,
  not dataset_name=/dataset_version=. The mismatch would raise TypeError
  against the real azure-ai-projects 2.3.0a* SDK; only unmocked
  integration paths were affected.

Tests cover: auto-detection happy path, versionless hosted agent,
explicit hosted_agent_version forwarding, force_prompt_source override,
non-string chat_client attrs (MagicMock test doubles) not mis-detected,
agent_version forwarded through _to_sdk_source, and the corrected
dataset SDK kwarg names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry-evals): accept canonical dimension_scores key per docs

The published Foundry rubric-evaluator output (Microsoft Learn 'Rubric evaluators' reference) places per-dimension breakdowns under properties.dimension_scores, not properties.rubric_scores. The parser now tries dimension_scores first and falls back to rubric_scores for preview-build compatibility, and tolerates non-list payloads (e.g. MagicMock auto-attrs) by trying the next candidate when parsing yields zero entries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(foundry-evals): add manual create_rubric_evaluator

Adds FoundryEvals.create_rubric_evaluator as the agent-framework surface over project_client.beta.evaluators.create_version. This is the manual counterpart to generate_rubric: callers supply RubricDimension instances (authored locally, ported from another framework, or hand-tuned) and we POST a RubricBasedEvaluatorDefinition. The service auto-attaches the non-editable residual dimension (general_quality for quality, general_policy_compliance for safety).

Per the Microsoft Learn 'Rubric evaluators' reference, the auto-generation path (create_generation_job) is primarily a portal/UI feature; external SDK clients with rich local agent context are better served by manual create_version. This keeps generate_rubric for users who want to round-trip through a Foundry-registered agent.

Validation up front: weight must be in [1,10], ids unique, descriptions non-empty, pass_threshold in [0,1]. The returned GeneratedEvaluatorRef is identical in shape to one obtained from generate_rubric, so downstream evaluators= lists work unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* samples(foundry-evals): manual rubric sample + namespace re-exports

Adds evaluate_with_manual_rubric_sample.py demonstrating the end-to-end dev scenario for FoundryEvals.create_rubric_evaluator: hand-author a list of RubricDimension, register via create_rubric_evaluator, then use the pinned GeneratedEvaluatorRef alongside built-in evaluators in an agent regression run.

Also re-exports RubricDimension, GeneratedEvaluatorRef, build_sources, and load_evals_config from agent_framework.foundry (both the lazy runtime shim and the type stub) so the rubric samples can import everything from a single namespace; the auto-generate sample was previously broken because the shim was missing build_sources / load_evals_config.

Updates the foundry-evals README with a chooser entry for the two rubric paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(foundry-evals): remove rubric creation flows; keep consumption only

Reframes agent-framework as a pure consumer of Foundry rubric evaluators: scoring against rubrics that already exist (authored in the Foundry portal or via the dedicated SDK / REST surface) instead of creating them from the SDK.

Removed creation surface area:

- FoundryEvals.generate_rubric (auto-generate path) and create_rubric_evaluator (manual path), plus all _GenerationSdkTypes / _ManualRubricSdkTypes / _to_sdk_dimensions / _coalesce_generation_sources / _to_sdk_source / _poll_generation_job / _generation_job_to_ref / _evaluator_version_to_ref / _get_beta_evaluators / _import_*_sdk_types helpers.

- EvalGenerationSource (the input source discriminator), RubricDimension (the input dimension type), agent_as_eval_source / workflow_as_eval_source / _detect_hosted_foundry_agent helpers, and the YAML-config loader (_evals_config.py with RubricGenerationSpec / RubricSourceSpec / parse_evals_config / load_evals_config / build_sources).

- BaseAgent.as_eval_source / Workflow.as_eval_source plus the _render_agent_dossier / _render_workflow_dossier helpers in core. These existed only to feed the now-removed generation pipeline.

- Samples evaluate_with_generated_rubric_sample.py, evaluate_with_manual_rubric_sample.py, and evaluators.yaml. Replaced with a short README section showing how to reference an existing rubric evaluator via GeneratedEvaluatorRef.

Kept (consumption surface):

- GeneratedEvaluatorRef, slimmed to (name, version, display_name). Still accepted alongside built-in evaluator strings in FoundryEvals(evaluators=[...]). Versionless refs still warn.

- RubricScore on EvalScoreResult.dimensions plus EvalResults.assert_dimension_score_at_least for per-dimension CI gates.

- _parse_dimension_entries / _extract_rubric_scores output parsing (both canonical dimension_scores and the legacy rubric_scores key).

Tests: 160/160 foundry unit tests and 71/71 core local-eval tests pass; pyright is clean across changed files. The pre-existing tests/core/test_telemetry.py::test_detect_hosted_fallback_import_error failure is unrelated and reproduces on the prior commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* samples(foundry-evals): add evaluate_with_rubric_sample

Adds a runnable end-to-end sample showing how to consume a pre-existing rubric evaluator created in Foundry: reference it with GeneratedEvaluatorRef(name, version), mix it with built-in evaluators in FoundryEvals, and gate CI with assert_dimension_score_at_least on a specific dimension.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry-evals): satisfy mypy on _fetch_output_items

mypy infers OutputItemListResponse.sample as dict[str, object] | None while pyright correctly infers the typed Sample model. Cast to Any so both type checkers accept the attribute access pattern, rename the local to avoid shadowing the inner-loop sample binding, and drop the now-stale pyright suppressions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(foundry-evals): drop unpublished rubric-evaluators learn.microsoft.com link

The Adaptive Evals authoring docs are not yet published on Microsoft Learn, so the link 404s. Keep the descriptive text without the broken hyperlink; we can re-add it once the docs ship.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(foundry-evals): hoist repeated local imports to module top

Per code review feedback (eavanvalkenburg): the test file repeated 'from agent_framework_foundry._foundry_evals import ...' inside 22 test bodies and 'from agent_framework_foundry import GeneratedEvaluatorRef' inside 8 more. Move all of them to the existing top-level imports; the symbols are the same across tests and the local imports were redundant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e0d0ad16a0 · 2026-06-01 23:01:56 +00:00
History
..
2025-07-28 07:33:42 +00:00

Python Samples

This directory contains samples demonstrating the capabilities of Microsoft Agent Framework for Python.

Structure

Folder Description
01-get-started/ Progressive tutorial: hello agent → hosting
02-agents/ Deep-dive by concept: tools, middleware, providers, orchestrations
03-workflows/ Workflow patterns: sequential, concurrent, state, declarative, explicit output designation
04-hosting/ Deployment: Azure Functions, Durable Tasks, A2A
05-end-to-end/ Full applications, evaluation, demos

Getting Started

Start with 01-get-started/ and work through the numbered files:

  1. 01_hello_agent.py — Create and run your first agent
  2. 02_add_tools.py — Add function tools with @tool
  3. 03_multi_turn.py — Multi-turn conversations with AgentSession
  4. 04_memory.py — Agent memory with ContextProvider
  5. 05_functional_workflow_with_agents.py — Call agents inside a functional workflow
  6. 06_functional_workflow_basics.py — Write a workflow as a plain async function
  7. 07_first_graph_workflow.py — Build a workflow with executors and edges
  8. 08_host_your_agent.py — Host your agent via Azure Functions

Prerequisites

pip install agent-framework

Environment Variables

Samples call load_dotenv() to automatically load environment variables from a .env file in the python/ directory. This is a convenience for local development and testing.

For local development, set up your environment using any of these methods:

Option 1: Using a .env file (recommended for local development):

  1. Copy .env.example to .env in the python/ directory:
    cp .env.example .env
    
  2. Edit .env and set your values (API keys, endpoints, etc.)

Option 2: Export environment variables directly:

export FOUNDRY_PROJECT_ENDPOINT="your-foundry-project-endpoint"
export FOUNDRY_MODEL="gpt-4o"

Option 3: Using env_file_path parameter (for per-client configuration):

All client classes (e.g., OpenAIChatClient, OpenAIChatCompletionClient) support an env_file_path parameter to load environment variables from a specific file:

from agent_framework.openai import OpenAIChatClient

# Load from a custom .env file
client = OpenAIChatClient(env_file_path="path/to/custom.env")

This allows different clients to use different configuration files if needed.

For the generic OpenAI clients (OpenAIChatClient and OpenAIChatCompletionClient), routing precedence is:

  1. Explicit Azure inputs such as credential, azure_endpoint, or api_version
  2. OPENAI_API_KEY / explicit OpenAI API-key parameters
  3. Azure environment fallback such as AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY

If you keep both OpenAI and Azure variables in your shell, the generic clients stay on OpenAI until you pass an explicit Azure input.

For the getting-started samples, you'll need at minimum:

FOUNDRY_PROJECT_ENDPOINT="your-foundry-project-endpoint"
FOUNDRY_MODEL="gpt-4o"

Consolidated sample env inventory

This is the single source of truth for package-level environment variables read by packages included by agent-framework-core[all]. It intentionally excludes variables that are only read by standalone samples, package sample folders, or tests. When package code adds, removes, or renames an environment variable, update this table in the same change.

Example values below are illustrative. For entries not backed by a single public class, the class column names the closest public surface, helper, or package-level initialization point that reads the variable.

package class/module env var example value
agent-framework-anthropic AnthropicClient ANTHROPIC_API_KEY sk-ant-api03-...
agent-framework-anthropic AnthropicClient ANTHROPIC_CHAT_MODEL claude-sonnet-4-5-20250929
agent-framework-foundry FoundryEmbeddingClient FOUNDRY_MODELS_ENDPOINT https://my-endpoint.inference.ai.azure.com
agent-framework-foundry FoundryEmbeddingClient FOUNDRY_MODELS_API_KEY env-key
agent-framework-foundry FoundryEmbeddingClient FOUNDRY_EMBEDDING_MODEL text-embedding-3-small
agent-framework-foundry FoundryEmbeddingClient FOUNDRY_IMAGE_EMBEDDING_MODEL Cohere-embed-v3-english
agent-framework-azure-ai-search AzureAISearchContextProvider AZURE_SEARCH_ENDPOINT https://my-search.search.windows.net
agent-framework-azure-ai-search AzureAISearchContextProvider AZURE_SEARCH_API_KEY search-key
agent-framework-azure-ai-search AzureAISearchContextProvider AZURE_SEARCH_INDEX_NAME hotels-index
agent-framework-azure-ai-search AzureAISearchContextProvider AZURE_SEARCH_KNOWLEDGE_BASE_NAME hotels-kb
agent-framework-azure-cosmos CosmosHistoryProvider AZURE_COSMOS_ENDPOINT https://my-cosmos.documents.azure.com:443/
agent-framework-azure-cosmos CosmosHistoryProvider AZURE_COSMOS_DATABASE_NAME agent-history
agent-framework-azure-cosmos CosmosHistoryProvider AZURE_COSMOS_CONTAINER_NAME messages
agent-framework-azure-cosmos CosmosHistoryProvider AZURE_COSMOS_KEY C2F...==
agent-framework-bedrock BedrockChatClient BEDROCK_REGION us-east-1
agent-framework-bedrock BedrockChatClient BEDROCK_CHAT_MODEL anthropic.claude-3-5-sonnet-20241022-v2:0
agent-framework-bedrock BedrockEmbeddingClient BEDROCK_REGION us-east-1
agent-framework-bedrock BedrockEmbeddingClient BEDROCK_EMBEDDING_MODEL amazon.titan-embed-text-v2:0
agent-framework-bedrock BedrockChatClient / BedrockEmbeddingClient AWS_ACCESS_KEY_ID AKIAIOSFODNN7EXAMPLE
agent-framework-bedrock BedrockChatClient / BedrockEmbeddingClient AWS_SECRET_ACCESS_KEY wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
agent-framework-bedrock BedrockChatClient / BedrockEmbeddingClient AWS_SESSION_TOKEN IQoJb3JpZ2luX2VjEO7//////////wEaCXVzLXdlc3QtMiJHMEUCIQD...
agent-framework-copilotstudio CopilotStudioAgent COPILOTSTUDIOAGENT__ENVIRONMENTID 00000000-0000-0000-0000-000000000000
agent-framework-copilotstudio CopilotStudioAgent COPILOTSTUDIOAGENT__SCHEMANAME cr123_agentname
agent-framework-copilotstudio CopilotStudioAgent COPILOTSTUDIOAGENT__TENANTID 11111111-1111-1111-1111-111111111111
agent-framework-copilotstudio CopilotStudioAgent COPILOTSTUDIOAGENT__AGENTAPPID 22222222-2222-2222-2222-222222222222
agent-framework-core observability ENABLE_INSTRUMENTATION true
agent-framework-core observability ENABLE_SENSITIVE_DATA false
agent-framework-core observability ENABLE_CONSOLE_EXPORTERS true
agent-framework-core observability OTEL_EXPORTER_OTLP_ENDPOINT http://localhost:4317
agent-framework-core observability OTEL_EXPORTER_OTLP_TRACES_ENDPOINT http://localhost:4318/v1/traces
agent-framework-core observability OTEL_EXPORTER_OTLP_METRICS_ENDPOINT http://localhost:4318/v1/metrics
agent-framework-core observability OTEL_EXPORTER_OTLP_LOGS_ENDPOINT http://localhost:4318/v1/logs
agent-framework-core observability OTEL_EXPORTER_OTLP_PROTOCOL grpc
agent-framework-core observability OTEL_EXPORTER_OTLP_HEADERS api-key=demo
agent-framework-core observability OTEL_EXPORTER_OTLP_TRACES_HEADERS api-key=trace-demo
agent-framework-core observability OTEL_EXPORTER_OTLP_METRICS_HEADERS api-key=metric-demo
agent-framework-core observability OTEL_EXPORTER_OTLP_LOGS_HEADERS api-key=log-demo
agent-framework-core observability OTEL_SERVICE_NAME sample-agent
agent-framework-core observability OTEL_SERVICE_VERSION 1.0.0
agent-framework-core observability OTEL_RESOURCE_ATTRIBUTES deployment.environment=dev,service.namespace=agent-framework
agent-framework-devui DevUI server DEVUI_AUTH_TOKEN my-devui-token
agent-framework-foundry FoundryChatClient FOUNDRY_PROJECT_ENDPOINT https://my-project.services.ai.azure.com/api/projects/my-project
agent-framework-foundry FoundryChatClient FOUNDRY_MODEL gpt-4o
agent-framework-foundry FoundryAgent FOUNDRY_AGENT_NAME travel-planner
agent-framework-foundry FoundryAgent FOUNDRY_AGENT_VERSION v1
agent-framework-github-copilot GitHubCopilotAgent GITHUB_COPILOT_CLI_PATH copilot
agent-framework-github-copilot GitHubCopilotAgent GITHUB_COPILOT_MODEL gpt-5
agent-framework-github-copilot GitHubCopilotAgent GITHUB_COPILOT_TIMEOUT 60
agent-framework-github-copilot GitHubCopilotAgent GITHUB_COPILOT_LOG_LEVEL info
agent-framework-mem0 agent_framework_mem0 package import MEM0_TELEMETRY false
agent-framework-ollama OllamaChatClient OLLAMA_HOST http://localhost:11434
agent-framework-ollama OllamaChatClient OLLAMA_MODEL llama3.1:8b
agent-framework-openai OpenAIChatClient / OpenAIChatCompletionClient / OpenAIEmbeddingClient OPENAI_API_KEY sk-proj-...
agent-framework-openai OpenAIChatClient / OpenAIChatCompletionClient / OpenAIEmbeddingClient OPENAI_MODEL gpt-4o-mini
agent-framework-openai OpenAIChatClient OPENAI_CHAT_MODEL gpt-4.1-mini
agent-framework-openai OpenAIChatCompletionClient OPENAI_CHAT_COMPLETION_MODEL gpt-4o
agent-framework-openai OpenAIEmbeddingClient OPENAI_EMBEDDING_MODEL text-embedding-3-small
agent-framework-openai OpenAIChatClient / OpenAIChatCompletionClient / OpenAIEmbeddingClient OPENAI_BASE_URL https://api.openai.com/v1/
agent-framework-openai OpenAIChatClient / OpenAIChatCompletionClient / OpenAIEmbeddingClient OPENAI_ORG_ID org_123456789
agent-framework-openai OpenAIChatClient / OpenAIChatCompletionClient / OpenAIEmbeddingClient AZURE_OPENAI_ENDPOINT https://my-resource.openai.azure.com/
agent-framework-openai OpenAIChatClient / OpenAIChatCompletionClient / OpenAIEmbeddingClient AZURE_OPENAI_API_KEY sk-azure-...
agent-framework-openai OpenAIChatClient / OpenAIChatCompletionClient / OpenAIEmbeddingClient AZURE_OPENAI_API_VERSION 2024-10-21
agent-framework-openai OpenAIChatClient / OpenAIChatCompletionClient / OpenAIEmbeddingClient AZURE_OPENAI_BASE_URL https://my-resource.openai.azure.com/openai/v1/
agent-framework-openai OpenAIChatClient / OpenAIChatCompletionClient / OpenAIEmbeddingClient AZURE_OPENAI_MODEL gpt-4o
agent-framework-openai OpenAIChatClient AZURE_OPENAI_CHAT_MODEL gpt-4.1
agent-framework-openai OpenAIChatCompletionClient AZURE_OPENAI_CHAT_COMPLETION_MODEL gpt-4o-mini
agent-framework-openai OpenAIEmbeddingClient AZURE_OPENAI_EMBEDDING_MODEL text-embedding-3-large
agent-framework-openai OpenAIChatClient / OpenAIChatCompletionClient / OpenAIEmbeddingClient AZURE_OPENAI_RESOURCE_URL https://cognitiveservices.azure.com/

agent-framework-openai supports the Azure OpenAI client-specific deployment aliases listed above; keep packages/openai/README.md as the authoritative reference for the exact fallback order and package-specific behavior.

Note for production: In production environments, set environment variables through your deployment platform (e.g., Azure App Settings, Kubernetes ConfigMaps/Secrets) rather than using .env files. The load_dotenv() call in samples will have no effect when a .env file is not present, allowing environment variables to be loaded from the system.

For Azure authentication, run az login before running samples.

Note on XML tags

Some sample files include XML-style snippet tags (for example <snippet_name> and </snippet_name>). These are used by our documentation tooling and can be ignored or removed when you use the samples outside this repository.

Additional Resources