mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
72a6157c6a
* Enable instrumentation by default * Update samples * Optimization when span is not recording * Address Copilot comments * Revert uv.lock * Add warning * Formatting * Fix mypy * Add disable_instrumentation() with sticky user-intent semantics Add a public disable_instrumentation() entry point so users can explicitly opt out of Agent Framework telemetry, with a sticky-disable flag that makes the user's intent "leading" — no framework code path (foundry's configure_azure_monitor, configure_otel_providers, enable_instrumentation, enable_sensitive_telemetry, or direct OBSERVABILITY_SETTINGS.enable_* writes) can re-enable instrumentation until the user explicitly clears the disable with enable_instrumentation(force=True) / enable_sensitive_telemetry(force=True). Also addresses the two remaining unresolved review threads on the PR: 1. test_observability_settings_defaults_instrumentation_true pins the new "ENABLE_INSTRUMENTATION defaults to True when env unset" behavior. 2. test_enable_instrumentation_reads_env_sensitive_data restores coverage for the post-import load_dotenv() fallback path. Implementation: - ObservabilitySettings.enable_instrumentation / enable_sensitive_data become properties backed by _enable_*. While _user_disabled is True, the getters return False and the setters drop True writes (defense in depth so third- party writes can't subvert the disable). - Public is_user_disabled read-only property lets integrations (e.g. foundry's configure_azure_monitor) cheaply check the disable state without poking at privates. - enable_instrumentation() and enable_sensitive_telemetry() short-circuit with an info log when disabled; gain a force=True kwarg that clears the disable. - configure_otel_providers() still creates providers / exporters / views so a later force-enable can use them, but logs an info message when called while disabled. - Foundry's FoundryChatClient.configure_azure_monitor and FoundryAgent.configure_azure_monitor early-return when the user has disabled, so Azure Monitor's global providers aren't installed unnecessarily. Tests: 11 new tests covering default-on, env re-read at call time, sticky behavior against each re-enable surface (enable_instrumentation, enable_sensitive_telemetry, configure_otel_providers, direct attribute writes), force=True override, re-arming the disable, and the __all__ export. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: document disable_instrumentation() and force=True paths Add a "Disabling instrumentation" section to the observability sample README that walks through: - The distinction between the ENABLE_INSTRUMENTATION env var (initial, non-sticky) and disable_instrumentation() (process-wide, sticky). - Why the sticky semantics matter: framework integrations like FoundryChatClient.configure_azure_monitor() can call enable_instrumentation() as part of their setup, and the user's opt-out needs to win. - All five surfaces guarded by the sticky disable (property reads, public enable functions, configure_otel_providers, direct attribute writes, is_user_disabled-aware integrations). - The force=True escape hatch on both enable_instrumentation() and enable_sensitive_telemetry(). - How third-party integrations should consult OBSERVABILITY_SETTINGS.is_user_disabled. - The limits of the disable (does not tear down existing providers / in-flight spans / third-party instrumentation, does not persist across processes). Cross-links the new section from the ENABLE_INSTRUMENTATION row in the env vars table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: soften disable_instrumentation() overclaim about telemetry guarantees Replace 'no telemetry will be emitted no matter what' (which is too strong, since callers can still pass force=True or mutate private attributes) with language framing the disable as a user-intent contract that library and framework code is expected to honor: the framework actively short-circuits the public enable paths, force=True and private-attribute writes are acknowledged as out-of-contract escape hatches that integrations should not use on the user's behalf. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: correct observability Dependencies section - opentelemetry-sdk is no longer a hard dependency; it is lazily imported by create_resource(), create_metric_views(), and configure_otel_providers() with a clear ImportError when missing. Day-to-day instrumentation works with opentelemetry-api alone provided some other component configures the global OpenTelemetry providers (Azure Monitor, an APM agent, application bootstrap, etc.). - opentelemetry-semantic-conventions-ai is no longer used anywhere in the source; remove it from the listed dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: replace stale observability migration guide with current PR's only relevant migration The old guide documented the move away from setup_observability(otlp_endpoint=...) which was an earlier-release API change unrelated to this PR and stale enough that it's more confusing than helpful at this point. Replace it with a short note on the single migration this PR introduces: callers of enable_instrumentation(enable_sensitive_data=True) should switch to enable_sensitive_telemetry(). Cross-link to the Disabling instrumentation section for the rare 'force on without enabling sensitive data' use case where enable_instrumentation() still applies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
129 lines
4.8 KiB
Python
129 lines
4.8 KiB
Python
# Copyright (c) Microsoft. All rights reserved.
|
|
|
|
import asyncio
|
|
from random import randint
|
|
from typing import TYPE_CHECKING, Annotated
|
|
|
|
from agent_framework import Message, tool
|
|
from agent_framework.foundry import FoundryChatClient
|
|
from agent_framework.observability import get_tracer
|
|
from azure.identity import AzureCliCredential
|
|
from dotenv import load_dotenv
|
|
from opentelemetry.trace import SpanKind
|
|
from opentelemetry.trace.span import format_trace_id
|
|
from pydantic import Field
|
|
|
|
if TYPE_CHECKING:
|
|
from agent_framework import SupportsChatGetResponse
|
|
|
|
|
|
"""
|
|
This sample shows how you can configure observability of an application with zero code changes.
|
|
|
|
Agent Framework is natively instrumented with OpenTelemetry, so no auto-instrumentation of the
|
|
framework itself is required. Running the `opentelemetry-instrument` CLI wrapper simply configures
|
|
the global tracer/meter providers and exporters from environment variables (or CLI flags) at
|
|
process startup, so the application code does not need to set them up explicitly. The native
|
|
spans/metrics emitted by Agent Framework are then picked up by that globally configured pipeline.
|
|
|
|
See: https://opentelemetry.io/docs/zero-code/python/
|
|
|
|
Install the OpenTelemetry CLI tool following the guidance above (when using `uv` there are some
|
|
additional steps, so follow the instructions carefully).
|
|
|
|
Then setup a local OpenTelemetry Collector instance to receive the traces and metrics (and update
|
|
the endpoint below).
|
|
|
|
Then you can run:
|
|
```bash
|
|
opentelemetry-instrument \
|
|
--traces_exporter otlp \
|
|
--metrics_exporter otlp \
|
|
--service_name agent_framework \
|
|
--exporter_otlp_endpoint http://localhost:4317 \
|
|
python python/samples/02-agents/observability/advanced_zero_code.py
|
|
```
|
|
(or use uv run in front when you've done the install within your uv virtual environment)
|
|
|
|
You can also set the environment variables instead of passing them as CLI arguments.
|
|
|
|
"""
|
|
|
|
# Load environment variables from .env file
|
|
load_dotenv()
|
|
|
|
|
|
# NOTE: approval_mode="never_require" is for sample brevity.
|
|
# Use "always_require" in production; see samples/02-agents/tools/function_tool_with_approval.py
|
|
# and samples/02-agents/tools/function_tool_with_approval_and_sessions.py.
|
|
@tool(approval_mode="never_require")
|
|
async def get_weather(
|
|
location: Annotated[str, Field(description="The location to get the weather for.")],
|
|
) -> str:
|
|
"""Get the weather for a given location."""
|
|
await asyncio.sleep(randint(0, 10) / 10.0) # Simulate a network call
|
|
conditions = ["sunny", "cloudy", "rainy", "stormy"]
|
|
return f"The weather in {location} is {conditions[randint(0, 3)]} with a high of {randint(10, 30)}°C."
|
|
|
|
|
|
async def run_chat_client(client: "SupportsChatGetResponse", stream: bool = False) -> None:
|
|
"""Run an AI service.
|
|
|
|
This function runs an AI service and prints the output.
|
|
Telemetry will be collected for the service execution behind the scenes,
|
|
and the traces will be sent to the configured telemetry backend.
|
|
|
|
The telemetry will include information about the AI service execution.
|
|
|
|
Args:
|
|
stream: Whether to use streaming for the plugin
|
|
|
|
Remarks:
|
|
When `FunctionInvocationLayer` is outside `ChatTelemetryLayer`,
|
|
each call to the model is handled as a separate span.
|
|
If `ChatMiddlewareLayer` is present, keep it outside telemetry
|
|
so middleware latency does not skew those timings.
|
|
By contrast, when telemetry is placed outside the function loop,
|
|
a single span can cover one or more rounds of function calling.
|
|
|
|
So for the scenario below, you should see the following:
|
|
|
|
2 spans with gen_ai.operation.name=chat
|
|
The first has finish_reason "tool_calls"
|
|
The second has finish_reason "stop"
|
|
2 spans with gen_ai.operation.name=execute_tool
|
|
|
|
"""
|
|
message = "What's the weather in Amsterdam and in Paris?"
|
|
print(f"User: {message}")
|
|
if stream:
|
|
print("Assistant: ", end="")
|
|
async for chunk in client.get_response(
|
|
[Message(role="user", contents=[message])],
|
|
stream=True,
|
|
options={"tools": [get_weather]},
|
|
):
|
|
if chunk.text:
|
|
print(chunk.text, end="")
|
|
print("")
|
|
else:
|
|
response = await client.get_response(
|
|
[Message(role="user", contents=[message])],
|
|
options={"tools": [get_weather]},
|
|
)
|
|
print(f"Assistant: {response}")
|
|
|
|
|
|
async def main() -> None:
|
|
with get_tracer().start_as_current_span("Zero Code", kind=SpanKind.CLIENT) as current_span:
|
|
print(f"Trace ID: {format_trace_id(current_span.get_span_context().trace_id)}")
|
|
|
|
client = FoundryChatClient(credential=AzureCliCredential())
|
|
|
|
await run_chat_client(client, stream=True)
|
|
await run_chat_client(client, stream=False)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
asyncio.run(main())
|