mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
72a6157c6a
* Enable instrumentation by default * Update samples * Optimization when span is not recording * Address Copilot comments * Revert uv.lock * Add warning * Formatting * Fix mypy * Add disable_instrumentation() with sticky user-intent semantics Add a public disable_instrumentation() entry point so users can explicitly opt out of Agent Framework telemetry, with a sticky-disable flag that makes the user's intent "leading" — no framework code path (foundry's configure_azure_monitor, configure_otel_providers, enable_instrumentation, enable_sensitive_telemetry, or direct OBSERVABILITY_SETTINGS.enable_* writes) can re-enable instrumentation until the user explicitly clears the disable with enable_instrumentation(force=True) / enable_sensitive_telemetry(force=True). Also addresses the two remaining unresolved review threads on the PR: 1. test_observability_settings_defaults_instrumentation_true pins the new "ENABLE_INSTRUMENTATION defaults to True when env unset" behavior. 2. test_enable_instrumentation_reads_env_sensitive_data restores coverage for the post-import load_dotenv() fallback path. Implementation: - ObservabilitySettings.enable_instrumentation / enable_sensitive_data become properties backed by _enable_*. While _user_disabled is True, the getters return False and the setters drop True writes (defense in depth so third- party writes can't subvert the disable). - Public is_user_disabled read-only property lets integrations (e.g. foundry's configure_azure_monitor) cheaply check the disable state without poking at privates. - enable_instrumentation() and enable_sensitive_telemetry() short-circuit with an info log when disabled; gain a force=True kwarg that clears the disable. - configure_otel_providers() still creates providers / exporters / views so a later force-enable can use them, but logs an info message when called while disabled. - Foundry's FoundryChatClient.configure_azure_monitor and FoundryAgent.configure_azure_monitor early-return when the user has disabled, so Azure Monitor's global providers aren't installed unnecessarily. Tests: 11 new tests covering default-on, env re-read at call time, sticky behavior against each re-enable surface (enable_instrumentation, enable_sensitive_telemetry, configure_otel_providers, direct attribute writes), force=True override, re-arming the disable, and the __all__ export. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: document disable_instrumentation() and force=True paths Add a "Disabling instrumentation" section to the observability sample README that walks through: - The distinction between the ENABLE_INSTRUMENTATION env var (initial, non-sticky) and disable_instrumentation() (process-wide, sticky). - Why the sticky semantics matter: framework integrations like FoundryChatClient.configure_azure_monitor() can call enable_instrumentation() as part of their setup, and the user's opt-out needs to win. - All five surfaces guarded by the sticky disable (property reads, public enable functions, configure_otel_providers, direct attribute writes, is_user_disabled-aware integrations). - The force=True escape hatch on both enable_instrumentation() and enable_sensitive_telemetry(). - How third-party integrations should consult OBSERVABILITY_SETTINGS.is_user_disabled. - The limits of the disable (does not tear down existing providers / in-flight spans / third-party instrumentation, does not persist across processes). Cross-links the new section from the ENABLE_INSTRUMENTATION row in the env vars table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: soften disable_instrumentation() overclaim about telemetry guarantees Replace 'no telemetry will be emitted no matter what' (which is too strong, since callers can still pass force=True or mutate private attributes) with language framing the disable as a user-intent contract that library and framework code is expected to honor: the framework actively short-circuits the public enable paths, force=True and private-attribute writes are acknowledged as out-of-contract escape hatches that integrations should not use on the user's behalf. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: correct observability Dependencies section - opentelemetry-sdk is no longer a hard dependency; it is lazily imported by create_resource(), create_metric_views(), and configure_otel_providers() with a clear ImportError when missing. Day-to-day instrumentation works with opentelemetry-api alone provided some other component configures the global OpenTelemetry providers (Azure Monitor, an APM agent, application bootstrap, etc.). - opentelemetry-semantic-conventions-ai is no longer used anywhere in the source; remove it from the listed dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: replace stale observability migration guide with current PR's only relevant migration The old guide documented the move away from setup_observability(otlp_endpoint=...) which was an earlier-release API change unrelated to this PR and stale enough that it's more confusing than helpful at this point. Replace it with a short note on the single migration this PR introduces: callers of enable_instrumentation(enable_sensitive_data=True) should switch to enable_sensitive_telemetry(). Cross-link to the Disabling instrumentation section for the rare 'force on without enabling sensitive data' use case where enable_instrumentation() still applies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
460 lines
26 KiB
Markdown
460 lines
26 KiB
Markdown
# Agent Framework Observability
|
|
|
|
These samples show how to send Agent Framework observability data to the Application Performance Management (APM) backend of your choice, based on the OpenTelemetry standard.
|
|
|
|
The samples target [Application Insights](https://learn.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview), the [Aspire Dashboard](https://learn.microsoft.com/en-us/dotnet/aspire/fundamentals/dashboard/overview?tabs=bash), and the console, but any OTLP-compatible backend works.
|
|
|
|
> **Quick Start**: For local development without Azure setup, use the [Aspire Dashboard](https://learn.microsoft.com/en-us/dotnet/aspire/fundamentals/dashboard/standalone) (runs locally via Docker), or the built-in tracing module of the [AI Toolkit for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio).
|
|
|
|
> Other backends such as [Prometheus](https://prometheus.io/docs/introduction/overview/) are also supported. See the [OpenTelemetry Python exporters](https://opentelemetry.io/docs/languages/python/exporters/) page for the full list.
|
|
|
|
For more information, please refer to the following resources:
|
|
|
|
1. [Azure Monitor OpenTelemetry Exporter](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/monitor/azure-monitor-opentelemetry-exporter)
|
|
2. [Aspire Dashboard for Python Apps](https://learn.microsoft.com/en-us/dotnet/aspire/fundamentals/dashboard/standalone-for-python?tabs=flask%2Cwindows)
|
|
3. [AI Toolkit for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio)
|
|
4. [Python Logging](https://docs.python.org/3/library/logging.html)
|
|
5. [Observability in Python](https://www.cncf.io/blog/2022/04/22/opentelemetry-and-python-a-complete-instrumentation-guide/)
|
|
|
|
## What to expect
|
|
|
|
The Agent Framework Python SDK is **natively instrumented** to emit logs, traces, and metrics throughout agent/model invocation and tool execution, so you can monitor your AI application's performance and track token consumption. Instrumentation follows the OpenTelemetry [Semantic Conventions for GenAI](https://opentelemetry.io/docs/specs/semconv/gen-ai/), and workflows emit their own spans for end-to-end visibility.
|
|
|
|
Setting up observability is also easy: a single call to `configure_otel_providers()` from the `agent_framework.observability` module wires up the trace, log, and metric providers. It reads the standard OpenTelemetry environment variables to configure exporters automatically.
|
|
|
|
### Five patterns for configuring observability
|
|
|
|
> Setting up observability has two parts: (1) **instrumentation**, the code that generates telemetry, and (2) **exporter/provider configuration**, which decides where that telemetry is sent. Agent Framework is natively instrumented and **enabled by default**, so you only need to handle the second part.
|
|
|
|
There are five common ways to do that, depending on your needs:
|
|
|
|
**1. Standard otel environment variables, configured for you**
|
|
|
|
The simplest approach - configure everything via environment variables:
|
|
|
|
```python
|
|
from agent_framework.observability import configure_otel_providers
|
|
|
|
# Reads OTEL_EXPORTER_OTLP_* environment variables automatically
|
|
configure_otel_providers()
|
|
```
|
|
|
|
Or if you just want console exporters:
|
|
|
|
```python
|
|
from agent_framework.observability import configure_otel_providers
|
|
|
|
configure_otel_providers(enable_console_exporters=True)
|
|
# It is also possible to set ENABLE_CONSOLE_EXPORTERS=true in environment
|
|
# variables instead of calling `configure_otel_providers()` with the parameter.
|
|
# The framework will automatically read that and set up console exporters.
|
|
```
|
|
|
|
This is the **recommended approach** for getting started.
|
|
|
|
**2. Custom Exporters**
|
|
|
|
For more control, construct exporters yourself and pass them to `configure_otel_providers()`. The framework still creates the providers for you:
|
|
|
|
```python
|
|
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
|
|
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
|
|
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
|
|
from opentelemetry.exporter.otlp.proto.grpc.exporter import Compression
|
|
from agent_framework.observability import configure_otel_providers
|
|
|
|
# Create custom exporters with specific configuration
|
|
exporters = [
|
|
OTLPSpanExporter(endpoint="http://localhost:4317", compression=Compression.Gzip),
|
|
OTLPLogExporter(endpoint="http://localhost:4317"),
|
|
OTLPMetricExporter(endpoint="http://localhost:4317"),
|
|
]
|
|
|
|
# These are added alongside any exporters configured from environment variables
|
|
configure_otel_providers(exporters=exporters)
|
|
```
|
|
|
|
**3. Third-party setup**
|
|
|
|
Many third-party OTel packages ship their own setup helpers (for example, Azure Monitor's `configure_azure_monitor()`). You can use those directly — Agent Framework instrumentation is on by default, so no extra wiring is needed. To also capture sensitive data, call `enable_sensitive_telemetry()` from `agent_framework.observability`.
|
|
|
|
```python
|
|
from azure.monitor.opentelemetry import configure_azure_monitor
|
|
from agent_framework.observability import create_resource, enable_sensitive_telemetry
|
|
|
|
# Configure Azure Monitor first
|
|
configure_azure_monitor(
|
|
connection_string="InstrumentationKey=...",
|
|
resource=create_resource(), # Uses OTEL_SERVICE_NAME, etc.
|
|
enable_live_metrics=True,
|
|
)
|
|
|
|
# Optional: opt in to capturing sensitive data
|
|
enable_sensitive_telemetry()
|
|
```
|
|
|
|
For Microsoft Foundry projects, use `client.configure_azure_monitor()` which retrieves the connection string from the project and configures everything:
|
|
|
|
```python
|
|
from agent_framework.foundry import FoundryChatClient
|
|
from azure.identity import AzureCliCredential
|
|
|
|
client = FoundryChatClient(
|
|
project_endpoint="https://your-project.services.ai.azure.com",
|
|
model="gpt-4o",
|
|
credential=AzureCliCredential(),
|
|
)
|
|
|
|
# Automatically configures Azure Monitor with connection string from project
|
|
await client.configure_azure_monitor(enable_sensitive_data=True)
|
|
```
|
|
|
|
Or with [Langfuse](https://langfuse.com/integrations/frameworks/microsoft-agent-framework):
|
|
|
|
```python
|
|
# environment should be setup correctly, with langfuse urls and keys
|
|
from agent_framework.observability import enable_sensitive_telemetry
|
|
from langfuse import get_client
|
|
|
|
langfuse = get_client()
|
|
|
|
# Verify connection
|
|
if langfuse.auth_check():
|
|
print("Langfuse client is authenticated and ready!")
|
|
else:
|
|
print("Authentication failed. Please check your credentials and host.")
|
|
|
|
# Agent Framework instrumentation is on by default.
|
|
# Optional: opt in to capturing sensitive data
|
|
enable_sensitive_telemetry()
|
|
```
|
|
|
|
Or with [Comet Opik](https://www.comet.com/docs/opik/integrations/microsoft-agent-framework):
|
|
|
|
```python
|
|
import os
|
|
|
|
from agent_framework.observability import enable_sensitive_telemetry
|
|
|
|
# Use Opik OTLP settings from your project settings
|
|
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "<opik_otlp_endpoint>"
|
|
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = "<opik_otlp_headers>"
|
|
|
|
# Agent Framework instrumentation is on by default.
|
|
# Optional: opt in to capturing sensitive data
|
|
enable_sensitive_telemetry()
|
|
```
|
|
|
|
**4. Manual setup**
|
|
|
|
For full control, set up providers and exporters yourself. See [advanced_manual_setup_console_output.py](./advanced_manual_setup_console_output.py) for a complete example that sends traces, logs, and metrics to the console. The `create_resource()` helper in `agent_framework.observability` can build a resource with the appropriate service name and version from environment variables (or sensible defaults), although the sample does not use it.
|
|
|
|
**5. Zero-code provider/exporter configuration**
|
|
|
|
Because Agent Framework is **natively instrumented** with OpenTelemetry, you do not need to auto-instrument the framework itself. You can, however, use the [`opentelemetry-instrument`](https://opentelemetry.io/docs/zero-code/python/) CLI wrapper to configure the global tracer/meter providers and exporters from environment variables (or CLI flags) at process startup. Your application code then does not need to call `configure_otel_providers()` — the native spans and metrics from Agent Framework are picked up by the globally configured pipeline. See [advanced_zero_code.py](./advanced_zero_code.py) for an example.
|
|
|
|
### MCP trace propagation
|
|
|
|
Whenever there is an active OpenTelemetry span context, Agent Framework automatically propagates trace context to MCP servers via the `params._meta` field of `tools/call` requests. It uses the globally configured OpenTelemetry propagator(s) — W3C Trace Context by default (producing `traceparent` and `tracestate`) — so custom propagators (B3, Jaeger, etc.) are also supported. This enables distributed tracing across agent-to-MCP-server boundaries, compliant with the [MCP `_meta` specification](https://modelcontextprotocol.io/specification/2025-11-25/basic#_meta).
|
|
|
|
**Scope:** automatic `_meta` injection applies only to MCP sessions that the agent process itself opens — `MCPStreamableHTTPTool`, `MCPStdioTool`, and `MCPWebsocketTool` (or any other client-opened `MCPTool` subclass). It does **not** apply to hosted or provider-managed MCP tool configurations such as `FoundryChatClient.get_mcp_tool(...)`, `OpenAIChatClient.get_mcp_tool(...)`, `AnthropicClient.get_mcp_tool(...)`, `GeminiChatClient.get_mcp_tool(...)`, or toolbox-fetched tools (e.g. `toolbox = await client.get_toolbox(...)` then `Agent(tools=toolbox.tools)`). In those cases the `tools/call` message is issued by the provider service runtime rather than by the agent process, so propagating `traceparent`/`tracestate` across that boundary is the service runtime's responsibility. If you need end-to-end distributed tracing to the downstream MCP server, use a client-opened MCP transport instead of a hosted connector.
|
|
|
|
## Configuration
|
|
|
|
### Dependencies
|
|
|
|
Agent Framework's core depends on **`opentelemetry-api`** only — the API package is enough for the instrumentation hooks (spans, meters, log records) to emit telemetry, and it has no runtime side effects when no provider is configured.
|
|
|
|
If you want the framework to set up providers / exporters for you via `configure_otel_providers()` (or to use the `create_resource()` / `create_metric_views()` helpers), you also need the OpenTelemetry SDK:
|
|
|
|
```bash
|
|
pip install opentelemetry-sdk
|
|
```
|
|
|
|
If `opentelemetry-sdk` is missing, those helper functions raise a clear `ImportError` telling you to install it. Day-to-day instrumentation still works without the SDK as long as some other component (e.g. `azure-monitor-opentelemetry`, your application bootstrap, an APM agent) has configured the global OpenTelemetry providers.
|
|
|
|
Exporters are **not** installed by default — install only what you need:
|
|
- **Application Insights**: `azure-monitor-opentelemetry`
|
|
- **Aspire Dashboard or other OTLP/gRPC backends**: `opentelemetry-exporter-otlp-proto-grpc`
|
|
- **OTLP over HTTP**: `opentelemetry-exporter-otlp-proto-http`
|
|
|
|
For other backends, refer to the documentation of the specific exporter.
|
|
|
|
### Environment variables
|
|
|
|
Agent Framework reads the following environment variables:
|
|
|
|
| Variable | Default | Purpose |
|
|
|----------|---------|---------|
|
|
| `ENABLE_INSTRUMENTATION` | `true` | Set to `false` to disable native instrumentation. See [Disabling instrumentation](#disabling-instrumentation) for the programmatic alternative with sticky semantics. |
|
|
| `ENABLE_SENSITIVE_DATA` | `false` | Set to `true` to emit sensitive data (prompts, responses, etc.). |
|
|
| `ENABLE_CONSOLE_EXPORTERS` | `false` | Set to `true` to add console exporters. Only used by `configure_otel_providers()`. |
|
|
| `VS_CODE_EXTENSION_PORT` | unset | Port used by the [AI Toolkit for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio#tracing) tracing integration. Only used by `configure_otel_providers()`. |
|
|
|
|
You can also call `enable_sensitive_telemetry()` from `agent_framework.observability` to opt in to sensitive-data capture programmatically.
|
|
|
|
> **Note**: Sensitive data includes prompts, responses, and tool arguments. Only enable it in development or test environments — it may expose user or system secrets in production.
|
|
|
|
### Disabling instrumentation
|
|
|
|
There are two ways to turn Agent Framework's native instrumentation off, and they have **different scopes**:
|
|
|
|
| Approach | Scope | Sticky? | When framework code calls `enable_instrumentation()` later, what happens? |
|
|
|----------|-------|---------|---------------------------------------------------------------------------|
|
|
| `ENABLE_INSTRUMENTATION=false` in the environment | Initial settings only | No | Instrumentation flips back **on**. |
|
|
| `disable_instrumentation()` called from code | Process-wide, sticky | Yes | Instrumentation **stays off** — the user-disable intent wins. |
|
|
|
|
If you want telemetry off **and want it to stay off**, use `disable_instrumentation()`.
|
|
|
|
#### Sticky semantics — why this matters
|
|
|
|
Framework integrations and third-party libraries can call `enable_instrumentation()`, `enable_sensitive_telemetry()`, or `configure_otel_providers()` as part of their own setup. For example, `FoundryChatClient.configure_azure_monitor()` calls `enable_instrumentation()` after wiring up Azure Monitor. That's normally what you want — but if **you** have explicitly opted out, you don't want any of those calls to silently re-enable telemetry.
|
|
|
|
`disable_instrumentation()` solves this by setting a **sticky** flag on `OBSERVABILITY_SETTINGS` that remains in effect until you explicitly clear it. While the flag is set:
|
|
|
|
1. `OBSERVABILITY_SETTINGS.enable_instrumentation` and `enable_sensitive_data` **read as `False`** regardless of the stored value.
|
|
2. `enable_instrumentation()` and `enable_sensitive_telemetry()` are **no-ops** and log an info-level message.
|
|
3. `configure_otel_providers()` still configures providers / exporters / views (so a later force-enable can use them), but does not flip instrumentation on.
|
|
4. Direct attribute writes like `OBSERVABILITY_SETTINGS.enable_instrumentation = True` from any code are **silently dropped** (defense in depth).
|
|
5. Integrations that consult `OBSERVABILITY_SETTINGS.is_user_disabled` (e.g. `FoundryChatClient.configure_azure_monitor()`, `FoundryAgent.configure_azure_monitor()`) **skip their setup entirely**, so global Azure Monitor providers aren't installed unnecessarily.
|
|
|
|
```python
|
|
from agent_framework.observability import disable_instrumentation
|
|
|
|
# After this call, Agent Framework expresses your intent to opt out of telemetry.
|
|
# Library and framework code is expected to honor that intent and not flip
|
|
# instrumentation back on (e.g. by calling `enable_instrumentation()`,
|
|
# `enable_sensitive_telemetry()`, or writing to public attributes on
|
|
# `OBSERVABILITY_SETTINGS`). The framework actively short-circuits the public
|
|
# enable paths so the user's intent stays leading. A determined caller can still
|
|
# pass `force=True` or mutate private (`_`-prefixed) attributes to bypass it,
|
|
# but those are out-of-contract escape hatches that should not be used by
|
|
# integrations on the user's behalf.
|
|
disable_instrumentation()
|
|
```
|
|
|
|
#### Forcing re-enablement after a disable
|
|
|
|
To intentionally re-enable telemetry after `disable_instrumentation()`, pass `force=True` to either of the two public enable helpers. This is the only way to clear the sticky disable, so the user's opt-out can only be reversed by a deliberate user opt-in:
|
|
|
|
```python
|
|
from agent_framework.observability import (
|
|
disable_instrumentation,
|
|
enable_instrumentation,
|
|
enable_sensitive_telemetry,
|
|
)
|
|
|
|
disable_instrumentation()
|
|
|
|
# Without force=True, these are no-ops while the disable is sticky:
|
|
enable_instrumentation() # logs info, does nothing
|
|
enable_sensitive_telemetry() # logs info, does nothing
|
|
|
|
# With force=True, the sticky disable is cleared and the call proceeds:
|
|
enable_instrumentation(force=True)
|
|
# or
|
|
enable_sensitive_telemetry(force=True)
|
|
|
|
# After a force-enable you can `disable_instrumentation()` again to re-arm
|
|
# the sticky disable.
|
|
```
|
|
|
|
#### Checking the disable state from integrations
|
|
|
|
If you're writing an integration that performs telemetry setup as a side effect (e.g. provisioning a third-party exporter), consult the public read-only `is_user_disabled` property and early-return when it's set:
|
|
|
|
```python
|
|
from agent_framework.observability import OBSERVABILITY_SETTINGS
|
|
|
|
if OBSERVABILITY_SETTINGS.is_user_disabled:
|
|
logger.info(
|
|
"Skipping telemetry setup because the user called disable_instrumentation()."
|
|
)
|
|
return
|
|
```
|
|
|
|
This is what the built-in `FoundryChatClient.configure_azure_monitor()` and `FoundryAgent.configure_azure_monitor()` do — so calling `disable_instrumentation()` reliably prevents Azure Monitor's global providers from being installed by those helpers.
|
|
|
|
#### What `disable_instrumentation()` does **not** do
|
|
|
|
- It does not tear down OpenTelemetry providers, exporters, or in-flight spans that were already set up before the disable call. It only gates **future** captures by Agent Framework code paths.
|
|
- It does not stop telemetry from third-party instrumentations (e.g. `azure-monitor-opentelemetry`'s system metrics) that are wired up outside Agent Framework. Configure those separately if needed.
|
|
- It does not persist across processes. Each Python process starts with the disable flag cleared; if you always want telemetry off in a given environment, set `ENABLE_INSTRUMENTATION=false` as an environment variable in addition to (or instead of) the programmatic call.
|
|
|
|
#### Environment variables for `configure_otel_providers()`
|
|
|
|
The `configure_otel_providers()` function automatically reads **standard OpenTelemetry environment variables** to configure exporters:
|
|
|
|
**OTLP Configuration** (for Aspire Dashboard, Jaeger, etc.):
|
|
- `OTEL_EXPORTER_OTLP_ENDPOINT` - Base endpoint for all signals (e.g., `http://localhost:4317`)
|
|
- `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` - Traces-specific endpoint (overrides base)
|
|
- `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` - Metrics-specific endpoint (overrides base)
|
|
- `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT` - Logs-specific endpoint (overrides base)
|
|
- `OTEL_EXPORTER_OTLP_PROTOCOL` - Protocol to use (`grpc` or `http`, default: `grpc`)
|
|
- `OTEL_EXPORTER_OTLP_HEADERS` - Headers for all signals (e.g., `key1=value1,key2=value2`)
|
|
- `OTEL_EXPORTER_OTLP_TRACES_HEADERS` - Traces-specific headers (overrides base)
|
|
- `OTEL_EXPORTER_OTLP_METRICS_HEADERS` - Metrics-specific headers (overrides base)
|
|
- `OTEL_EXPORTER_OTLP_LOGS_HEADERS` - Logs-specific headers (overrides base)
|
|
|
|
**Service Identification**:
|
|
- `OTEL_SERVICE_NAME` - Service name (default: `agent_framework`)
|
|
- `OTEL_SERVICE_VERSION` - Service version (default: package version)
|
|
- `OTEL_RESOURCE_ATTRIBUTES` - Additional resource attributes (e.g., `key1=value1,key2=value2`)
|
|
|
|
> **Note**: These are standard OpenTelemetry environment variables. See the [OpenTelemetry spec](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/) for more details.
|
|
|
|
#### Logging
|
|
|
|
Use standard Python logging configuration to align logs with telemetry output:
|
|
|
|
```python
|
|
import logging
|
|
|
|
logging.basicConfig(
|
|
format="[%(asctime)s - %(pathname)s:%(lineno)d - %(levelname)s] %(message)s",
|
|
datefmt="%Y-%m-%d %H:%M:%S",
|
|
)
|
|
```
|
|
|
|
To control which logs are exported, adjust the root logger level — other loggers inherit from it by default:
|
|
|
|
```python
|
|
import logging
|
|
|
|
logging.getLogger().setLevel(logging.NOTSET)
|
|
```
|
|
|
|
## Samples
|
|
|
|
This folder contains different samples demonstrating how to use telemetry in various scenarios.
|
|
|
|
| Sample | Description |
|
|
|--------|-------------|
|
|
| [configure_otel_providers_with_env_var.py](./configure_otel_providers_with_env_var.py) | **Recommended starting point**: configure telemetry using standard OpenTelemetry environment variables (`OTEL_EXPORTER_OTLP_*`). |
|
|
| [configure_otel_providers_with_parameters.py](./configure_otel_providers_with_parameters.py) | Create custom exporters with specific configuration and pass them to `configure_otel_providers()`. |
|
|
| [agent_observability.py](./agent_observability.py) | Telemetry collection for an agentic application with tool calls. |
|
|
| [foundry_tracing.py](./foundry_tracing.py) | Azure Monitor integration with Microsoft Foundry. |
|
|
| [workflow_observability.py](./workflow_observability.py) | Telemetry collection for a workflow with multiple executors and message passing. |
|
|
| [advanced_manual_setup_console_output.py](./advanced_manual_setup_console_output.py) | Advanced: manual setup of exporters and providers with console output — useful for understanding how observability works under the hood. |
|
|
| [advanced_zero_code.py](./advanced_zero_code.py) | Advanced: zero-code provider/exporter setup using the `opentelemetry-instrument` CLI wrapper. |
|
|
|
|
### Running the samples
|
|
|
|
1. Open a terminal in this folder (`python/samples/02-agents/observability/`) so that `.env` is found.
|
|
2. Create a `.env` file if you don't already have one. See [.env.example](./.env.example).
|
|
> Instrumentation is on by default. Set `OTEL_EXPORTER_OTLP_ENDPOINT` (or other configuration) as needed. With no exporters configured, set `ENABLE_CONSOLE_EXPORTERS=true` for console output.
|
|
3. Pick an environment-loading approach:
|
|
- **A. Sample-managed:** run from this folder so the sample's `load_dotenv()` call can find `.env`.
|
|
- **B. Shell/IDE-managed:** export environment variables, or use an IDE run configuration that injects them.
|
|
- **C. Explicit env file in code:** pass `env_file_path` to APIs like `configure_otel_providers(env_file_path=".env")`.
|
|
- **D. CLI-managed:** run with `uv` and pass the file explicitly, e.g. `uv run --env-file=.env python configure_otel_providers_with_env_var.py`.
|
|
4. Activate your virtual environment, then run a sample (e.g. `python configure_otel_providers_with_env_var.py`).
|
|
|
|
> If you set up providers manually (e.g. Azure Monitor), Agent Framework instrumentation is still on by default. Call `enable_sensitive_telemetry()` if you also want to capture sensitive data. To have Agent Framework configure exporters and providers for you, call `configure_otel_providers(...)`.
|
|
|
|
> Each sample prints its Operation/Trace ID, which you can use to filter logs and traces in Application Insights or the Aspire Dashboard.
|
|
|
|
# Appendix
|
|
|
|
## Azure Monitor Queries
|
|
|
|
For an overall view of a span in Azure Monitor, run this query in the Logs section:
|
|
|
|
```kusto
|
|
dependencies
|
|
| where operation_Id in (dependencies
|
|
| project operation_Id, timestamp
|
|
| order by timestamp desc
|
|
| summarize operations = make_set(operation_Id), timestamp = max(timestamp) by operation_Id
|
|
| order by timestamp desc
|
|
| project operation_Id
|
|
| take 2)
|
|
| evaluate bag_unpack(customDimensions)
|
|
| extend tool_call_id = tostring(["gen_ai.tool.call.id"])
|
|
| join kind=leftouter (customMetrics
|
|
| extend tool_call_id = tostring(customDimensions['gen_ai.tool.call.id'])
|
|
| where isnotempty(tool_call_id)
|
|
| project tool_call_duration = value, tool_call_id)
|
|
on tool_call_id
|
|
| project-keep timestamp, target, operation_Id, tool_call_duration, duration, gen_ai*
|
|
| order by timestamp asc
|
|
```
|
|
|
|
### Grafana dashboards with Application Insights data
|
|
|
|
In addition to the native Application Insights UI, you can use Grafana to visualize the same telemetry data. Two tailored dashboards are available to get you started:
|
|
|
|
#### Agent Overview dashboard
|
|
Open dashboard in Azure portal: <https://aka.ms/amg/dash/af-agent>
|
|

|
|
|
|
#### Workflow Overview dashboard
|
|
Open dashboard in Azure portal: <https://aka.ms/amg/dash/af-workflow>
|
|

|
|
|
|
## Migration Guide
|
|
|
|
Instrumentation is now **enabled by default** (you no longer have to opt in by calling `enable_instrumentation()` at startup), and the way you opt in to capturing sensitive payloads has its own dedicated function.
|
|
|
|
If your code previously did:
|
|
|
|
```python
|
|
from agent_framework.observability import enable_instrumentation
|
|
|
|
enable_instrumentation(enable_sensitive_data=True)
|
|
```
|
|
|
|
replace it with:
|
|
|
|
```python
|
|
from agent_framework.observability import enable_sensitive_telemetry
|
|
|
|
enable_sensitive_telemetry()
|
|
```
|
|
|
|
`enable_sensitive_telemetry()` ensures that instrumentation is on and turns sensitive-event capture on in one call. `enable_instrumentation()` still exists for the rare case where you want to programmatically force instrumentation on without enabling sensitive data (e.g. to override `ENABLE_INSTRUMENTATION=false`), and it now also accepts `force=True` to clear a previous `disable_instrumentation()` — see [Disabling instrumentation](#disabling-instrumentation).
|
|
|
|
> **Note**: Sensitive data includes prompts, responses, and tool arguments. Only enable it in development or test environments — it may expose user or system secrets in production.
|
|
|
|
## Aspire Dashboard
|
|
|
|
The [Aspire Dashboard](https://learn.microsoft.com/en-us/dotnet/aspire/fundamentals/dashboard/standalone) is a local telemetry viewing tool that provides an excellent experience for viewing OpenTelemetry data without requiring Azure setup.
|
|
|
|
### Setting up Aspire Dashboard with Docker
|
|
|
|
The easiest way to run the Aspire Dashboard locally is using Docker:
|
|
|
|
```bash
|
|
# Pull and run the Aspire Dashboard container
|
|
docker run --rm -it -d \
|
|
-p 18888:18888 \
|
|
-p 4317:18889 \
|
|
--name aspire-dashboard \
|
|
mcr.microsoft.com/dotnet/aspire-dashboard:latest
|
|
```
|
|
|
|
This will start the dashboard with:
|
|
|
|
- **Web UI**: Available at <http://localhost:18888>
|
|
- **OTLP endpoint**: Available at `http://localhost:4317` for your applications to send telemetry data
|
|
|
|
### Configuring your application
|
|
|
|
Make sure your `.env` file includes the OTLP endpoint:
|
|
|
|
```bash
|
|
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
|
|
```
|
|
|
|
Or set it as an environment variable when running your samples:
|
|
|
|
```bash
|
|
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 python configure_otel_providers_with_env_var.py
|
|
```
|
|
|
|
### Viewing telemetry data
|
|
|
|
> Make sure you have the dashboard running to receive telemetry data.
|
|
|
|
Once your sample finishes running, navigate to <http://localhost:18888> in a web browser to see the telemetry data. Follow the [Aspire Dashboard exploration guide](https://learn.microsoft.com/en-us/dotnet/aspire/fundamentals/dashboard/explore) to authenticate to the dashboard and start exploring your traces, logs, and metrics!
|