Files
agent-framework/python/samples/02-agents/observability/advanced_zero_code.py
Tao Chen 72a6157c6a [BREAKING] Python: Enable instrumentation by default (#5865)
* Enable instrumentation by default

* Update samples

* Optimization when span is not recording

* Address Copilot comments

* Revert uv.lock

* Add warning

* Formatting

* Fix mypy

* Add disable_instrumentation() with sticky user-intent semantics

Add a public disable_instrumentation() entry point so users can explicitly opt
out of Agent Framework telemetry, with a sticky-disable flag that makes the
user's intent "leading" — no framework code path (foundry's
configure_azure_monitor, configure_otel_providers, enable_instrumentation,
enable_sensitive_telemetry, or direct OBSERVABILITY_SETTINGS.enable_*
writes) can re-enable instrumentation until the user explicitly clears the
disable with enable_instrumentation(force=True) /
enable_sensitive_telemetry(force=True).

Also addresses the two remaining unresolved review threads on the PR:
1. test_observability_settings_defaults_instrumentation_true pins the new
   "ENABLE_INSTRUMENTATION defaults to True when env unset" behavior.
2. test_enable_instrumentation_reads_env_sensitive_data restores coverage
   for the post-import load_dotenv() fallback path.

Implementation:
- ObservabilitySettings.enable_instrumentation / enable_sensitive_data become
  properties backed by _enable_*. While _user_disabled is True, the getters
  return False and the setters drop True writes (defense in depth so third-
  party writes can't subvert the disable).
- Public is_user_disabled read-only property lets integrations (e.g. foundry's
  configure_azure_monitor) cheaply check the disable state without poking at
  privates.
- enable_instrumentation() and enable_sensitive_telemetry() short-circuit with
  an info log when disabled; gain a force=True kwarg that clears the disable.
- configure_otel_providers() still creates providers / exporters / views so a
  later force-enable can use them, but logs an info message when called while
  disabled.
- Foundry's FoundryChatClient.configure_azure_monitor and
  FoundryAgent.configure_azure_monitor early-return when the user has
  disabled, so Azure Monitor's global providers aren't installed unnecessarily.

Tests: 11 new tests covering default-on, env re-read at call time, sticky
behavior against each re-enable surface (enable_instrumentation,
enable_sensitive_telemetry, configure_otel_providers, direct attribute
writes), force=True override, re-arming the disable, and the __all__ export.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: document disable_instrumentation() and force=True paths

Add a "Disabling instrumentation" section to the observability sample README
that walks through:

- The distinction between the ENABLE_INSTRUMENTATION env var (initial,
  non-sticky) and disable_instrumentation() (process-wide, sticky).
- Why the sticky semantics matter: framework integrations like
  FoundryChatClient.configure_azure_monitor() can call
  enable_instrumentation() as part of their setup, and the user's opt-out
  needs to win.
- All five surfaces guarded by the sticky disable (property reads, public
  enable functions, configure_otel_providers, direct attribute writes,
  is_user_disabled-aware integrations).
- The force=True escape hatch on both enable_instrumentation() and
  enable_sensitive_telemetry().
- How third-party integrations should consult OBSERVABILITY_SETTINGS.is_user_disabled.
- The limits of the disable (does not tear down existing providers /
  in-flight spans / third-party instrumentation, does not persist across
  processes).

Cross-links the new section from the ENABLE_INSTRUMENTATION row in the env
vars table.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: soften disable_instrumentation() overclaim about telemetry guarantees

Replace 'no telemetry will be emitted no matter what' (which is too strong,
since callers can still pass force=True or mutate private attributes) with
language framing the disable as a user-intent contract that library and
framework code is expected to honor: the framework actively short-circuits
the public enable paths, force=True and private-attribute writes are
acknowledged as out-of-contract escape hatches that integrations should
not use on the user's behalf.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: correct observability Dependencies section

- opentelemetry-sdk is no longer a hard dependency; it is lazily imported by
  create_resource(), create_metric_views(), and configure_otel_providers()
  with a clear ImportError when missing. Day-to-day instrumentation works
  with opentelemetry-api alone provided some other component configures the
  global OpenTelemetry providers (Azure Monitor, an APM agent, application
  bootstrap, etc.).
- opentelemetry-semantic-conventions-ai is no longer used anywhere in the
  source; remove it from the listed dependencies.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: replace stale observability migration guide with current PR's only relevant migration

The old guide documented the move away from setup_observability(otlp_endpoint=...)
which was an earlier-release API change unrelated to this PR and stale enough that
it's more confusing than helpful at this point. Replace it with a short note on the
single migration this PR introduces: callers of
enable_instrumentation(enable_sensitive_data=True) should switch to
enable_sensitive_telemetry(). Cross-link to the Disabling instrumentation section
for the rare 'force on without enabling sensitive data' use case where
enable_instrumentation() still applies.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-20 11:52:08 +00:00

129 lines
4.8 KiB
Python

# Copyright (c) Microsoft. All rights reserved.
import asyncio
from random import randint
from typing import TYPE_CHECKING, Annotated
from agent_framework import Message, tool
from agent_framework.foundry import FoundryChatClient
from agent_framework.observability import get_tracer
from azure.identity import AzureCliCredential
from dotenv import load_dotenv
from opentelemetry.trace import SpanKind
from opentelemetry.trace.span import format_trace_id
from pydantic import Field
if TYPE_CHECKING:
from agent_framework import SupportsChatGetResponse
"""
This sample shows how you can configure observability of an application with zero code changes.
Agent Framework is natively instrumented with OpenTelemetry, so no auto-instrumentation of the
framework itself is required. Running the `opentelemetry-instrument` CLI wrapper simply configures
the global tracer/meter providers and exporters from environment variables (or CLI flags) at
process startup, so the application code does not need to set them up explicitly. The native
spans/metrics emitted by Agent Framework are then picked up by that globally configured pipeline.
See: https://opentelemetry.io/docs/zero-code/python/
Install the OpenTelemetry CLI tool following the guidance above (when using `uv` there are some
additional steps, so follow the instructions carefully).
Then setup a local OpenTelemetry Collector instance to receive the traces and metrics (and update
the endpoint below).
Then you can run:
```bash
opentelemetry-instrument \
--traces_exporter otlp \
--metrics_exporter otlp \
--service_name agent_framework \
--exporter_otlp_endpoint http://localhost:4317 \
python python/samples/02-agents/observability/advanced_zero_code.py
```
(or use uv run in front when you've done the install within your uv virtual environment)
You can also set the environment variables instead of passing them as CLI arguments.
"""
# Load environment variables from .env file
load_dotenv()
# NOTE: approval_mode="never_require" is for sample brevity.
# Use "always_require" in production; see samples/02-agents/tools/function_tool_with_approval.py
# and samples/02-agents/tools/function_tool_with_approval_and_sessions.py.
@tool(approval_mode="never_require")
async def get_weather(
location: Annotated[str, Field(description="The location to get the weather for.")],
) -> str:
"""Get the weather for a given location."""
await asyncio.sleep(randint(0, 10) / 10.0) # Simulate a network call
conditions = ["sunny", "cloudy", "rainy", "stormy"]
return f"The weather in {location} is {conditions[randint(0, 3)]} with a high of {randint(10, 30)}°C."
async def run_chat_client(client: "SupportsChatGetResponse", stream: bool = False) -> None:
"""Run an AI service.
This function runs an AI service and prints the output.
Telemetry will be collected for the service execution behind the scenes,
and the traces will be sent to the configured telemetry backend.
The telemetry will include information about the AI service execution.
Args:
stream: Whether to use streaming for the plugin
Remarks:
When `FunctionInvocationLayer` is outside `ChatTelemetryLayer`,
each call to the model is handled as a separate span.
If `ChatMiddlewareLayer` is present, keep it outside telemetry
so middleware latency does not skew those timings.
By contrast, when telemetry is placed outside the function loop,
a single span can cover one or more rounds of function calling.
So for the scenario below, you should see the following:
2 spans with gen_ai.operation.name=chat
The first has finish_reason "tool_calls"
The second has finish_reason "stop"
2 spans with gen_ai.operation.name=execute_tool
"""
message = "What's the weather in Amsterdam and in Paris?"
print(f"User: {message}")
if stream:
print("Assistant: ", end="")
async for chunk in client.get_response(
[Message(role="user", contents=[message])],
stream=True,
options={"tools": [get_weather]},
):
if chunk.text:
print(chunk.text, end="")
print("")
else:
response = await client.get_response(
[Message(role="user", contents=[message])],
options={"tools": [get_weather]},
)
print(f"Assistant: {response}")
async def main() -> None:
with get_tracer().start_as_current_span("Zero Code", kind=SpanKind.CLIENT) as current_span:
print(f"Trace ID: {format_trace_id(current_span.get_span_context().trace_id)}")
client = FoundryChatClient(credential=AzureCliCredential())
await run_chat_client(client, stream=True)
await run_chat_client(client, stream=False)
if __name__ == "__main__":
asyncio.run(main())