Some providers, e.g. Gemini, do not use the CallId mechanism to disambiguate simultaneous function calls. This can result in message lists containing multiple turn to fail to filter properly.
The fix is to take advantage of the expectation that Handoff Orchestration is a "single-speaker" flow, which only has a single active AIAgent per "turn" and an agent's turn is not finished until all outstanding function calls are finished.
This allows us to expect that any ambiguous-CallId FunctionCallContent are either in separate turns or will have had a response before the next issued call with the same Id.
* Add set_stop_loss tool to concurrent_builder_tool_approval sample
Add a second approval-gated tool (set_stop_loss) to the concurrent workflow
tool approval sample to demonstrate handling approval requests for different
tools in the same concurrent workflow.
Changes:
- Add set_stop_loss(symbol, stop_price) with approval_mode='always_require'
- Include new tool in both agents' tool lists
- Update agent instructions and prompt to encourage stop-loss usage
- Update docstring to reflect two approval-gated tools
- Update sample output to show mixed approval requests
Fixes#4874
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Print tool name and arguments in concurrent sample's process_event_stream (#4874)
Align process_event_stream in concurrent_builder_tool_approval.py to print
the tool name and arguments when collecting approval requests, matching the
sample output comment and the sequential_builder_tool_approval.py pattern.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add None-guard for function_call access in tool approval sample (#4874)
Add explicit None-checks before accessing function_call.name and
function_call.arguments in concurrent_builder_tool_approval.py. The
function_call field is typed Content | None, so direct attribute access
without a guard could raise AttributeError and required type: ignore
comments. The None-guard is consistent with the pattern used in
_agent_run.py and removes the suppression comments.
Also add a regression test verifying that function_call defaults to None
and that the None-guard pattern is safe.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Apply same function_call None-guard to sibling tool-approval samples (#4874)
Apply the same fix to sequential_builder_tool_approval.py and
group_chat_builder_tool_approval.py, which had the identical pattern
of accessing function_call.name/arguments without a None-guard.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Python: Wrapper + Samples 1st (#5177)
* Experiment
* Update dependency and add non streaming
* Add more samples
* Rename samples
* Add invocations
* Comments 1
* Comments 2
* Comments 3
* Improve README
* Add local shell sample
* WIP: Add eval and memory samples
* Update user agent prefix
* Update user agent prefix doc
* Update dependency (#5215)
* Add tests and more content types (#5235)
* Add tests
* fix tests and sample
* Fix formatting
* Remove function approval contents
* Python: Refine samples and upgrade packages (#5261)
* Refine samples and upgrade pacakges
* Upgrade to a new package that fixes a bug
* Update model env var
* Move samples (#5281)
* Python: Upgrade agentserver packages (#5284)
* Upgrade agentserver packages
* Fix new types
* Python: Add special handling for workflows (#5298)
* Add special handling for workflows
* Address comments
* Improve samples (#5372)
* Python: Add more types (#5378)
* Add more type supports
* Upgrade packages
* Remove TODOs in README
* Fix README
* Comments and mypy
* User agent scoped
* Fix README
* Fix pre commit
* Fix pre commit 2
* Fix pre commit 3
* Fix pre commit 4
* Fix pre commit 5
* Fix pre commit 6
* Add azure-monitor-opentelemetry to dev deps
Fixes Samples & Markdown CI failure. The PR's new transitive dep on
azure-monitor-opentelemetry-exporter (via azure-ai-agentserver-core) makes
pyright resolve the azure.monitor.opentelemetry namespace, flipping the
check_md_code_blocks diagnostic for `configure_azure_monitor` from
reportMissingImports (filtered) to reportAttributeAccessIssue (not filtered).
Installing the umbrella azure-monitor-opentelemetry package in dev makes
pyright resolve the symbol correctly, matching the install guidance the
observability README already gives users.
---------
Co-authored-by: Evan Mattson <evan.mattson@microsoft.com>
* Expose forwarded_props to agents and tools via session metadata (#5239)
Include forwarded_props from AG-UI request input_data in session.metadata
(agent runner) and function_invocation_kwargs (workflow runner) so that
agents, tools, and workflow executors can access request-level metadata
such as invocation source flags from CopilotKit.
- Add forwarded_props to base_metadata in _agent_run.py when present
- Add 'forwarded_props' to AG_UI_INTERNAL_METADATA_KEYS to filter it
from LLM-bound client metadata
- Extract forwarded_props in _workflow_run.py and pass via
function_invocation_kwargs to workflow.run()
- Accept both snake_case and camelCase keys (forwarded_props/forwardedProps)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(ag-ui): pass stream=True as literal to satisfy pyright overload resolution (#5239)
The previous fix passed stream=True via **kwargs dict, which prevented
pyright from resolving the Workflow.run() overload to the streaming
variant. Pass stream=True as an explicit keyword argument so pyright
can correctly infer the ResponseStream return type.
Also remove unused pytest import in test file.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: address PR review feedback for forwarded_props (#5239)
- Use key-presence checks instead of truthiness for forwarded_props so
empty dict {} is forwarded correctly
- Gate function_invocation_kwargs on workflow.run() signature inspection
to avoid TypeError for workflows without **kwargs
- Change _build_safe_metadata to drop (with warning) keys whose
serialized values exceed 512 chars instead of truncating into invalid
JSON
- Rewrite metadata tests to exercise _build_safe_metadata directly with
JSON-decodability and truncation assertions
- Add workflow tests for empty dict forwarded_props, stream=True
assertion, and signature-gated kwarg dropping
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* test: add stream=True assertions to CapturingWorkflow tests (#5239)
Guard against accidental removal of the explicit stream=True kwarg
in all forwarded_props CapturingWorkflow test cases.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address review feedback for #5239: Python: Expose forwardedProps to agents and tools via session metadata
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add support for the Foundry Toolbox in MAF
Introduces a Foundry Toolbox integration: FoundryChatClient gains a
get_toolbox() helper plus select_toolbox_tools(), normalize_tools in
the core package flattens tool-collection wrappers (ToolboxVersionObject
and generic iterables, while leaving Pydantic BaseModel instances
alone), and the new agent_framework.foundry namespace re-exports the
toolbox helpers. Ships with unit tests, a sample, and a design doc.
azure-ai-projects is pinned to the public >=2.0.0,<3.0 range and the
lockfile resolves from public PyPI. The toolbox test module skips when
Toolbox* types are unavailable so CI stays green until the public 2.1.0
SDK lands. OMC tooling directories (.omc/, .omx/) are gitignored.
* Update to latest azure ai projects package
* Improve sample
* Rename ADR to 0025
* Update ADR
* Apply suggestion from @alliscode
Co-authored-by: Ben Thomas <ben.thomas@microsoft.com>
* Improve samples
* Update test
---------
Co-authored-by: Ben Thomas <ben.thomas@microsoft.com>
* adds devui integration and samples
* adds unit tests for devui integration
* fix: correct formatting of copyright notice in unit test files
* fixes formatting issues
* fixes build for net8 target
* fixes formatting errors on test apphost
* adds copyright notice to multiple files and removes unnecessary using directives
* Update dotnet/aspire-integration/Aspire.Hosting.AgentFramework.DevUI/DevUIAggregatorHostedService.cs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update dotnet/aspire-integration/Aspire.Hosting.AgentFramework.DevUI/DevUIAggregatorHostedService.cs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update dotnet/tests/Aspire.Hosting.AgentFramework.DevUI.UnitTests/Aspire.Hosting.AgentFramework.DevUI.UnitTests.csproj
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update dotnet/samples/DevUIIntegration/DevUIIntegration.AppHost/DevUIIntegration.AppHost.csproj
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update dotnet/aspire-integration/Aspire.Hosting.AgentFramework.DevUI/DevUIAggregatorHostedService.cs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Refactor project files to use TargetFrameworks instead of TargetFramework for multi-targeting support; add optional port property to DevUIResource class.
* Add unit tests for DevUIAggregatorHostedService; refactor project files for TargetFrameworks support
* Refactor project files to use TargetFrameworks for multi-targeting support in DevUIIntegration samples
* Remove unnecessary using directive for Aspire.Hosting in DevUIAggregatorHostedServiceTests
* merge
* fixes Conversation routing for non-first backends
* add documentation for devui integration sample
* update project references in solution file for improved integration
* fixes package versions post merge
* move Aspire.Hosting.AgentFramework.DevUI to dotnet/src
Move the project from aspire-integration/ to src/ to be consistent
with the location of all other projects in the repo.
* move DevUI sample to samples/05-end-to-end/DevUIAspireIntegration
Move the sample from samples/DevUIIntegration/ to
samples/05-end-to-end/DevUIAspireIntegration/ to match the location
of other end-to-end samples.
* remove unnecessary net472 framework condition from sample csproj files
These projects only target net10.0, so the
Condition="'$(TargetFramework)' != 'net472'" on ItemGroup is unnecessary.
* update sample model name from gpt-4.1 to gpt-5.4
Use a more up-to-date model name in the DevUI integration samples.
* Revert "remove unnecessary net472 framework condition from sample csproj files"
This reverts commit 08cf41253b.
* fix: use TargetFrameworks to override multi-targeting from Directory.Build.props
The parent Directory.Build.props sets TargetFrameworks to net10.0;net472,
which overrides the singular TargetFramework in each csproj. Use the plural
TargetFrameworks property set to net10.0 only to properly override it, and
remove the now-unnecessary net472 condition on ItemGroup.
* fixes aspire config
* fix: update Microsoft.Extensions packages to version 10.0.1
* Address Copilot review feedback on DevUI Aspire integration
- Fix request body dropping in ProxyConversationsAsync: always read the
body when ContentLength > 0 before routing, then pass it through to
all proxy calls (previously null was passed when backend was resolved
from query param or conversation map)
- Fix resource leak: dispose aggregator on startup failure in catch block
- Fix XML docs: accurately describe embedded resource serving behavior
- Remove reflection from DevUIResourceTests (InternalsVisibleTo already set)
- Make sensitive telemetry conditional on Development environment in samples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: update chat client version to gpt41 in both EditorAgent and WriterAgent
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Roger Barreto <19890735+rogerbarreto@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix CopilotStudioAgent to reuse existing conversation on session (#5285)
CopilotStudioAgent unconditionally called _start_new_conversation() in both
_run_impl and _run_stream_impl, ignoring any existing service_session_id on
the session. Add a guard to only start a new conversation when there is no
existing service_session_id, matching the pattern used by other agents.
Also fix pre-existing pyright reportMissingImports errors for orjson in
file_history_provider samples.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Revert out-of-scope sample file changes
Remove unrelated orjson type-ignore comment changes from sample files
that were outside the scope of the conversation-ID reuse fix.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: Add session support for Handoff-hosted Agents
In order to better support using `Workflows` hosted as `AIAgents` inside of Handoff workflows, we need to make proper use of AgentSession. This causes potential issues around checkpointing and making sure that we properly compute only the new incoming messages for each agent invocation.
* fix: AgentSession checkpointing using AIAgent's Serialize/Deserialize methods
We cannot rely on implicit serialization through `HandoffHostState` because we are missing type information.
* fix: Thread safety issue in `MultiPartyConversation.AllMessages`
* fix: Enable unwrapping of FunctionResultContent when ExternalRequest was wrapped into FunctionCallContent
* fix: Foundry Agents without description in Handoff
Foundry Agents without a description set will return an empty string (rather than null) for the description. This was breaking the fallback logic for `handoffReason`.
* test: Add unit tests
* Foundry Evals integration for .NET
- Core evaluation framework: EvalItem, LocalEvaluator, FunctionEvaluator, EvalChecks
- IAgentEvaluator interface with MeaiEvaluatorAdapter bridge
- AgentEvaluationExtensions for agent.EvaluateAsync() overloads
- FoundryEvals wrapping MEAI quality/safety evaluators
- ConversationSplitters (LastTurn, Full) and IConversationSplitter
- EvalItem.PerTurnItems() for multi-turn decomposition
- HasImageContent for multimodal content detection
- WorkflowEvaluationExtensions for per-agent workflow evaluation
- 7 eval samples mirroring Python parity:
02-agents/Evaluation: SimpleEval, ExpectedOutputs, Multimodal
03-workflows/Evaluation: WorkflowEval
05-end-to-end/Evaluation: FoundryQuality, MixedProviders, ConversationSplits
- Comprehensive unit tests (1958 passing)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Rewrite FoundryEvals to use real Foundry Evals API
Replace MEAI evaluator shim with actual OpenAI EvaluationClient protocol
methods. FoundryEvals now creates eval definitions, submits runs, polls
for completion, and fetches per-item results server-side.
- New constructor: FoundryEvals(AIProjectClient, model, evaluators)
- Add FoundryEvalConverter for MEAI ChatMessage -> Foundry JSON format
- Add EvalId, RunId, ReportUrl to AgentEvaluationResults
- All 20 built-in evaluator constants now work (agent, tool, quality, safety)
- Remove Microsoft.Extensions.AI.Evaluation.Quality/Safety dependencies
- Update all samples for new constructor (no more ChatConfiguration)
- Replace BuildEvaluators tests with ResolveEvaluator tests
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add response output to CustomEvals and ExpectedOutputs samples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address review: pagination, validation, error handling, tests
FoundryEvals fixes:
- Add pagination for output items (has_more/after cursor)
- Add guard clauses for pollIntervalSeconds/timeoutSeconds <= 0
- Fix double TryGetProperty for passed field parsing
- Throw on all-tool-evaluators with no tool definitions
- Fix XML doc (default 300s, not 180s)
New tests (30 added, 1989 total):
- EvalChecks: NonEmpty, ContainsExpected (pass/fail/skip/case),
HasImageContent, ToolCallsPresent
- FoundryEvalConverter: ConvertMessage (text, image, function call,
function results fan-out, empty fallback, mixed content),
ConvertEvalItem, BuildTestingCriteria (quality/agent/tool/groundedness
data mappings), BuildItemSchema
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix review: null-refs, Data.ToString() bug, ContainsExpected, add tests
- Fix NullReferenceException in sample Response display (pattern matching)
- Fix WorkflowEvaluationExtensions Data?.ToString() producing type names
instead of message text (pattern-match ChatMessage/AgentResponse/list)
- Change EvalChecks.ContainsExpected to return Passed=false when no
ExpectedOutput (was silently passing, masking misconfiguration)
- Add EvalItem constructor tests with LastTurn/Full/null splitters
- Add FoundryEvalConverter.ConvertMessage DataContent (base64 image) test
- Add ExtractAgentData tests with ChatMessage, list, and AgentResponse data
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix review: conversation fidelity, eval caching, fallback tests
- WorkflowEvaluationExtensions: preserve full response messages (tool calls,
intermediate) instead of synthetic 2-message conversation. Cast completed
Data to AgentResponse and use Messages when available, fallback to text.
- FoundryEvals: cache evalId per schema shape (hasContext, hasTools) so
subsequent EvaluateAsync calls create runs under the same eval definition.
- MeaiEvaluatorAdapter: code already correctly passes queryMessages (not full
conversation) to IEvaluator — no change needed, verified by inspection.
- Add tests: AgentResponse full messages preservation, unknown object
ToString() fallback for ExtractAgentData.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Rename AzureAI→Foundry: move eval files, update references
- Move FoundryEvals.cs and FoundryEvalConverter.cs from
Microsoft.Agents.AI.AzureAI to Microsoft.Agents.AI.Foundry
- Update namespace from AzureAI to Foundry in both files
- Add explicit usings required by Foundry project (no implicit usings)
- Move FoundryEvalConverter tests to Foundry.UnitTests project
(avoids ReplacingRedactor type conflict from dual project refs)
- Update all sample csproj references and using statements
- Remove Foundry project reference from AI UnitTests
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* PR review round 4: wire up tool extraction, remove eval cache, fix null safety
- BuildEvalItem: extract tools from agent via GetService<ChatOptions>() into EvalItem.Tools (Python parity)
- FoundryEvals: remove eval ID cache - each call creates fresh definition (matches Python behavior)
- FoundryEvals: replace null-forgiving operators with descriptive InvalidOperationException
- MixedProviders sample: remove unnecessary explicit PackageReferences (transitively provided)
- FoundryEvalConverter: document that tool results take precedence over text content
- Add LocalEvaluator zero-checks test documenting 0 metrics = failed behavior
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Python-dotnet parity: 9 feature gaps filled
New checks:
- ToolCallArgsMatch() — verify tool call names + argument subset match
- ToolCalledCheck(ToolCalledMode.Any, ...) — match any of the specified tools
- ToolCalledMode enum (All/Any)
FoundryEvals enhancements:
- Default evaluators now [Relevance, Coherence, TaskAdherence] (was Relevance, Coherence)
- Auto-add ToolCallAccuracy when items have tool definitions
- EvaluateTracesAsync — evaluate by response_ids, trace_ids, or agent_id
- EvaluateFoundryTargetAsync — evaluate deployed Foundry targets
Result type enrichment:
- AgentEvaluationResults: added Status, Error, PerEvaluator, DetailedItems
- New EvalItemResult/EvalScoreResult/PerEvaluatorResult types
- FoundryEvals populates all new fields from API responses
Workflow fix:
- Skip internal executors (_*, input-conversation, end-conversation, end)
Tests: 8 new tests covering ToolCallArgsMatch, ToolCalledMode.Any, internal executor filtering
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add MeaiEvaluatorAdapter and PerTurnItems edge case tests
- 3 tests for MeaiEvaluatorAdapter: query message forwarding, synthetic
response fallback, multiple items aggregation
- 3 tests for EvalItem.PerTurnItems: empty conversation, no user messages,
system+assistant only
- StubEvaluator and StubChatClient test helpers
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Blocking link check for outdated package in DevUI.
* Replace Dictionary<string, object> payloads with typed wire models
Introduce internal FoundryEvalWireModels.cs with compile-time-safe types
for the OpenAI Evals API wire format. The OpenAI .NET SDK (2.9.1) only
provides protocol-level methods with BinaryContent/ClientResult — no
typed request models. These internal models replace scattered dictionary
literals with [JsonPropertyName]-annotated classes, giving:
- Compile-time safety (typos become build errors)
- Single point of change when the API evolves
- IntelliSense discoverability
- Cleaner serialization via JsonPolymorphic for content items
Models: WireContentItem hierarchy (text, image, tool_call, tool_result),
WireMessage, WireEvalItemPayload, WireTestingCriterion, WireItemSchema,
WireCreateEvalRequest, WireCreateRunRequest, and data source variants.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Skip metric when Foundry returns neither score nor passed
When an evaluator returns no score and no passed value, the previous
code created BooleanMetric(name, false), which falsely failed items
via ItemPassed. Now we skip the MEAI metric entirely for indeterminate
results — the raw data remains available in DetailedItems for diagnostics.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address PR #4914 review comments: fix tool evaluator bug and add tests
- Fix duplicate ToolCallAccuracy: resolve evaluator names before checking
against ToolEvaluators set (Comment 2)
- Make FilterToolEvaluators internal for testability; add tests for the
ArgumentException edge case when all evaluators are tool-type (Comment 3)
- Add CancellationToken test for LocalEvaluator (Comment 4)
- Add EvaluateAsync integration test on Run with sequential workflow and
per-agent SubResults verification (Comment 5)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address Peter's review comments on PR #4914
- Add trailing newline to Evaluation_FoundryQuality.csproj (Comment 6)
- Make evaluator name lookups case-insensitive: switch BuiltinEvaluators,
ToolEvaluators, AgentEvaluators, and ResolveEvaluator's StartsWith check
from Ordinal to OrdinalIgnoreCase (Comment 7)
- Add Trace.TraceWarning when Foundry returns fewer results than submitted
items, indicating expected vs actual count before padding (Comment 8)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add Microsoft.Extensions.AI.Evaluation packages to Directory.Packages.props
These were removed in #5269 as unused, but are needed by the Foundry
and core evaluation integration added in this PR.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: alliscode <bentho@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add finish_reason support to AgentResponse and AgentResponseUpdate
Add finish_reason field to AgentResponse and AgentResponseUpdate classes,
propagate it through _process_update() and map_chat_to_agent_update(),
and add comprehensive unit tests.
Fixes#4622
* feat: add finish_reason to AgentResponse and AgentResponseUpdate
* style: add copyright header to test_finish_reason.py
* docs: add finish_reason to AgentResponse and AgentResponseUpdate docstrings
* refactor: move finish_reason tests into test_types.py per review feedback
Move all finish_reason test cases from the separate test_finish_reason.py
file into test_types.py as requested by eavanvalkenburg. Tests are placed
in a new '# region finish_reason' section at the end of the file.
* fix: use model instead of model_id in _process_update
Address PR review feedback from @eavanvalkenburg — ChatResponse and
ChatResponseUpdate both use 'model', not 'model_id'.
* fix: resolve SIM102 lint error in _process_update
Combine nested if statements for AgentResponse finish_reason check
to satisfy ruff SIM102 rule, with line wrapping to stay under 120 chars.
* fix: resolve pyright reportArgumentType in map_chat_to_agent_update
Add type: ignore[arg-type] for FinishReason NewType widening when
passing ChatResponseUpdate.finish_reason to AgentResponseUpdate.
Matches existing patterns in the codebase (40+ similar ignores).
* Fix url_citation annotations dropped in streaming (#5029)
Add url_citation branch to the streaming annotation handler in
_parse_chunk_from_openai, mirroring the existing non-streaming path.
The handler creates an Annotation with type='citation', title, url,
and annotated_regions (TextSpanRegion), wrapped in Content.from_text.
Update test_streaming_annotation_added_with_unknown_type to use a
truly unknown type, and add new tests for url_citation (with and
without url).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address review feedback for #5029: Python: [Bug]: url_citation annotations silently dropped in Foundry streaming (SharePoint grounding citations lost)
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
- Update Anthropic from 12.11.0 to 12.13.0
- Update Anthropic.Foundry from 0.4.2 to 0.5.0
- Change Anthropic project from release candidate to preview
- Add new IBetaService members (Agents, Environments, Sessions, Vaults) to test mock
Fixes#5246
When a custom @executor transforms agent output and sends a plain str,
the downstream AgentExecutor.from_str handler loses the full conversation
context. This adds a with_text() helper that creates a new
AgentExecutorResponse with replaced text while preserving the prior
conversation chain, so AgentExecutor.from_response is invoked instead.
- Add with_text(text) method to AgentExecutorResponse dataclass
- Add 3 regression tests in test_full_conversation.py
Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
* Improve workflow unit tests
* Update test name prefix for clarity.
* Update tests to surface any errors.
* fix check-point restore-time race in off-thread workflow event stream
* Fixes an intermittent checkpoint-restore race in in-process workflow runs.
The local MCP server can't be used for hosted tools tests because
Anthropic's backend needs to reach the MCP URL from their infrastructure
(not localhost on the CI runner). Revert to learn.microsoft.com/api/mcp
but catch BadRequestError, InternalServerError, APIConnectionError, and
APITimeoutError and pytest.skip so upstream outages don't block the
merge queue.
* Python: use local MCP server for hosted tools test and broaden image assertion
The hosted tools integration test was hitting rate limits on the external
learn.microsoft.com MCP server, causing persistent failures that retries
couldn't recover from. Switch to the local MCP server already spun up in
CI via LOCAL_MCP_URL, skipping when the env var isn't set.
Also broaden the image description assertion to accept common synonyms
(cottage, mansion, villa, etc.) instead of just "house", since the model
legitimately uses varied vocabulary for the same image.
* Address review feedback: validate LOCAL_MCP_URL scheme and use word boundaries
- Skip hosted tools test when LOCAL_MCP_URL lacks http/https scheme,
matching the pattern used in test_mcp.py.
- Use regex word boundaries for image assertion to avoid false matches
like "villain" matching "villa".
The misc-integration job (Anthropic, Ollama, MCP) frequently fails on merge to main when the upstream MCP server (e.g. learn.microsoft.com/api/mcp) returns a transient rate-limit error. The previous 5s retry delay is too short to ride out the upstream backoff window, so all retries fail and the merge queue is blocked. Bumping to 30s gives the upstream a chance to recover before pytest-retry re-runs the test.
* Add agent-framework-gemini package
* Add AGENTS.md documentation
* Add LICENSE file
* Add README.md for agent-framework-gemini package
* Add Google Gemini API keys to .env.example
* Add Google Gemini chat client implementation
* Add tests for GeminiChatClient
* Add Google Gemini agent examples
* Fix client inheritence order
* Update Gemini agent examples
* Update documentation
* Update AGENTS.md
* Add tests for JSON string handling in GeminiChatClient
* Add final response assembly test in GeminiChatClient
* Add tests for handling empty candidates in GeminiChatClient
* Improve Pydantic response handling in GeminiChatClient
* Add tests for function result resolution and callable tool normalization
* Add test for function result resolution when call_id is generated
* Refactor GeminiChatClient to correct inheritance order
Also updates constructor parameter order for environment file handling
* Enhance documentation and clarify Gemini-specific fields
* Update ThinkingConfig with new attributes and type
* Add tests for GoogleSearch and GoogleMaps configs
* Suppress valid-type mypy error on GeminiChatOptionsT
* Move service_url method near overrides
* Order _prepare_config kwargs by base then Gemini-specific
* Use FunctionCallingConfigMode for clarity and type safety
* Fix code_execution doc
* Add agent-framework-gemini to project dependencies
* Remove package from core dependencies
Initial release will be done without agent-framework-gemini in
core[all].
* Move integration tests into one file
* Remove __init__.py file from gemini tests directory
* Introduce RawGeminiChatClient as lightweight chat client
Updated GeminiChatClient to inherit from RawGeminiChatClient, maintaining full functionality with added features.
* Updated variable names from `model_id` to `model`
Across the codebase, including environment variables and client initialization. Adjusted related tests and sample scripts to reflect this change, ensuring consistency in the usage of the Gemini model identifier.
* Update AGENTS.md
* Update Gemini package to alpha status
* Fix docstrings in Gemini tests
* Change 'model_id' to 'model' in response handling
* Fix model property change in response handling
* Add built-in tool factory methods to Gemini client
Replaces boolean tool options (code_execution, google_search_grounding,
google_maps_grounding) with static factory methods that return types.Tool
objects: get_code_interpreter_tool, get_web_search_tool, get_mcp_tool,
get_file_search_tool, and get_maps_grounding_tool.
Simplifies _prepare_tools to a single translation boundary between
FunctionTool (framework) and FunctionDeclaration (Gemini API), with
types.Tool objects passed through unchanged.
* Surface code execution parts
_parse_parts now maps executable_code and code_execution_result
parts to text Content objects so callers can see the code run
and its output. Unknown part types log at debug level rather than
being silently dropped.
* Update Gemini client documentation
* Unify Gemini model name
Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
* Update Agent Framework core version
Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
* Add Python 3.14 in classifiers
* Replace kwargs with parameters in tool factories
* Refactor chat options handling in Gemini client
* Add tests for handling unknown and consumed keys
* Update Gemini documentation
Now reflects new options and built-in tool factory methods
* Change build system to flit
Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
* Fix build system in pyproject.toml
* Fix type checking for generate_content_stream
---------
Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
* Python: Skip get_final_response in OTel _finalize_stream when stream errored
When a streaming error occurs, _finalize_stream (a cleanup hook registered by
AgentTelemetryLayer) was unconditionally calling get_final_response(), which
triggers all registered result hooks including after_run context providers.
This caused providers to fire incorrectly on error paths.
Guard against this by checking result_stream._consumed: True only after
StopAsyncIteration (normal completion), False when an exception was raised.
The fix applies to both the chat client and agent telemetry layers.
Closes#5231
* Python: Expose consumed/stream_error on ResponseStream and capture error in OTel span
Address Copilot review feedback on #5232:
- Add `_stream_error: Exception | None` to ResponseStream, set in __anext__'s
except branch so cleanup hooks can inspect the failure.
- Expose public `consumed` and `stream_error` properties to avoid coupling
observability.py to private stream internals.
- Update both _finalize_stream closures (chat and agent layers) to use the
public properties and call capture_exception() with the stream error before
returning early, ensuring the OTel span records the failure rather than
closing silently.
* Python: Address Copilot review feedback on stream error handling
- Use stream_error is not None as the guard in _finalize_stream instead of
not consumed, so the early-return path is keyed precisely to actual errors
rather than any non-normal completion state.
- Clear _stream_error after _run_cleanup_hooks() completes to avoid retaining
the exception traceback (and any large object graphs it references) on the
stream instance beyond the cleanup phase.
* Python: Remove consumed/stream_error properties, use private attrs directly
Per review feedback: since observability.py and _types.py are in the same
package, accessing _stream_error directly is fine and the public properties
are unnecessary.
* Python: Fix Pyright reportPrivateUsage via inline ignore comments
Keep _stream_error private (consistent with rest of ResponseStream), and
suppress reportPrivateUsage at the call sites in observability.py with
inline pyright: ignore comments — access is intentional within the package.