mirror of https://github.com/microsoft/agent-framework.git synced 2026-06-16 21:04:09 +08:00

Files

T

Eduard van Valkenburg 6acab3d1d6 Python: [BREAKING] Standardize model selection on model (#4999 )

* Refactor Anthropic model option and provider clients

Rename the Anthropic client model option from model_id to model, add provider-specific Anthropic wrappers for Foundry, Bedrock, and Vertex, and expose them through the Anthropic, Foundry, Amazon, and Google namespaces. Update core option handling, docs, samples, and tests accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Anthropic skills sample typing

Cast the Anthropic beta client to Any in the skills sample so the pre-commit sample pyright check no longer fails on beta skills and files endpoints that are not exposed by the current SDK stubs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* undo sample mypy

* Retry CI after transient external failures

Retrigger PR validation after an unrelated Copilot review workflow SAML failure and a transient external tau2 git fetch failure in the Windows Python test setup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback on model option merging

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address Anthropic compatibility review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* moved all to `model`

* fixes for azure ai search

* Python: standardize remaining sample env var names

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix foundry-local pyright compatibility

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updated env vars in cicd

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

6acab3d1d6 · 2026-04-01 19:00:18 +00:00

History

.env.example

Python: Foundry Evals integration for Python (#4750 )

2026-03-31 15:53:06 +00:00

evaluate_agent_sample.py

Python: [BREAKING] Standardize model selection on model (#4999 )

2026-04-01 19:00:18 +00:00

evaluate_mixed_sample.py

Python: [BREAKING] Standardize model selection on model (#4999 )

2026-04-01 19:00:18 +00:00

evaluate_multiturn_sample.py

Python: [BREAKING] Standardize model selection on model (#4999 )

2026-04-01 19:00:18 +00:00

evaluate_tool_calls_sample.py

Python: [BREAKING] Standardize model selection on model (#4999 )

2026-04-01 19:00:18 +00:00

evaluate_traces_sample.py

Python: [BREAKING] Standardize model selection on model (#4999 )

2026-04-01 19:00:18 +00:00

evaluate_workflow_sample.py

Python: [BREAKING] Standardize model selection on model (#4999 )

2026-04-01 19:00:18 +00:00

README.md

Python: Foundry Evals integration for Python (#4750 )

2026-03-31 15:53:06 +00:00

README.md

Foundry Evals Integration Samples

These samples demonstrate evaluating agent-framework agents using Azure AI Foundry's built-in evaluators.

Available Evaluators

Category	Evaluators
Agent behavior	`intent_resolution`, `task_adherence`, `task_completion`, `task_navigation_efficiency`
Tool usage	`tool_call_accuracy`, `tool_selection`, `tool_input_accuracy`, `tool_output_utilization`, `tool_call_success`
Quality	`coherence`, `fluency`, `relevance`, `groundedness`, `response_completeness`, `similarity`
Safety	`violence`, `sexual`, `self_harm`, `hate_unfairness`

Samples

`evaluate_agent_sample.py` — Dataset Evaluation (Path 3)

The dev inner loop. Two patterns from simplest to most control:

evaluate_agent() — One call: runs agent → converts → evaluates
FoundryEvals.evaluate() — Run agent yourself, convert with AgentEvalConverter, inspect/modify, then evaluate

uv run samples/05-end-to-end/evaluation/foundry_evals/evaluate_agent_sample.py

`evaluate_traces_sample.py` — Trace & Response Evaluation (Path 1)

Evaluate what already happened — zero changes to agent code:

evaluate_traces(response_ids=...) — Evaluate Responses API responses by ID
evaluate_traces(agent_id=...) — Evaluate agent behavior from OTel traces in App Insights

uv run samples/05-end-to-end/evaluation/foundry_evals/evaluate_traces_sample.py

Setup

Create a .env file with configuration as in the .env.example file in this folder.

Which sample should I start with?

"I want to test my agent during development" → evaluate_agent_sample.py, Pattern 1
"I want to evaluate past agent runs" → evaluate_traces_sample.py
"I want to inspect/modify eval data before submitting" → evaluate_agent_sample.py, Pattern 2

README.md

Foundry Evals Integration Samples

Available Evaluators

Samples

evaluate_agent_sample.py — Dataset Evaluation (Path 3)

evaluate_traces_sample.py — Trace & Response Evaluation (Path 1)

Setup

Which sample should I start with?

`evaluate_agent_sample.py` — Dataset Evaluation (Path 3)

`evaluate_traces_sample.py` — Trace & Response Evaluation (Path 1)