Files
agent-framework/python/packages/purview/README.md
Taisir Hassan 383d551b86 Purview: Parallelize PSPC cold-cache scope refresh (#5832)
* Parallelize Purview PSPC cold cache path

* Cache Purview payment-required state for scope refresh

* Cache Purview payment-required state for scope refresh

* Align Purview policy action dedupe and 402 caching

 Deduplicate combined policy actions by action and restriction action so restriction-only actions are preserved
without duplicating identical entries. Cache tenant-level payment-required state from background scope refresh so
subsequent calls short-circuit consistently.

* .NET: Implement best-effort caching for background job scope retrieval and add unit tests for cache write failures

* Purview - feat: Enhance ScopedContentProcessor to queue ContentActivityJob when no applicable scopes are found and update related tests

* docs: Update purview package README and AGENTS documentation to reflect caching optimizations and policy enforcement scenarios

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-09 18:01:21 +00:00

325 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Microsoft Agent Framework Purview Integration (Python)
`agent-framework-purview` adds Microsoft Purview (Microsoft Graph dataSecurityAndGovernance) policy evaluation to the Microsoft Agent Framework. It lets you enforce data security / governance policies on both the *prompt* (user input + conversation history) and the *model response* before they proceed further in your workflow.
> Status: **Preview**
### Key Features
- Middleware-based policy enforcement (agent-level and chat-client level)
- Blocks or allows content at both ingress (prompt) and egress (response)
- Works with any `Agent` / agent orchestration using the standard Agent Framework middleware pipeline
- Supports both synchronous `TokenCredential` and `AsyncTokenCredential` from `azure-identity`
- Configuration via `PurviewSettings` / `PurviewAppLocation`
- Built-in caching with configurable TTL and size limits for protection scopes in `PurviewSettings`
- Background processing for content activities and offline policy evaluation
### When to Use
Add Purview when you need to:
- **Prevent sensitive data leaks**: Inline blocking of sensitive content based on Data Loss Prevention (DLP) policies.
- **Enable governance**: Log AI interactions in Purview for Audit, Communication Compliance, Insider Risk Management, eDiscovery, and Data Lifecycle Management.
- Prevent sensitive or disallowed content from being sent to an LLM
- Prevent model output containing disallowed data from leaving the system
- Apply centrally managed policies without rewriting agent logic
---
## Prerequisites
- Microsoft Azure subscription with Microsoft Purview configured.
- Microsoft 365 subscription with an E5 license and pay-as-you-go billing setup.
- For testing, you can use a Microsoft 365 Developer Program tenant. For more information, see [Join the Microsoft 365 Developer Program](https://learn.microsoft.com/en-us/office/developer-program/microsoft-365-developer-program).
### Authentication
`PurviewClient` uses the `azure-identity` library for token acquisition. You can use any `TokenCredential` or `AsyncTokenCredential` implementation.
- **Entra registration**: Register your agent and add the required Microsoft Graph permissions (`dataSecurityAndGovernance`) to the Service Principal. For more information, see [Register an application in Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity-platform/quickstart-register-app) and [dataSecurityAndGovernance resource type](https://learn.microsoft.com/en-us/graph/api/resources/datasecurityandgovernance). You'll need the Microsoft Entra app ID in the next step.
- **Graph Permissions**:
- ProtectionScopes.Compute.All : [userProtectionScopeContainer](https://learn.microsoft.com/en-us/graph/api/userprotectionscopecontainer-compute)
- Content.Process.All : [processContent](https://learn.microsoft.com/en-us/graph/api/userdatasecurityandgovernance-processcontent)
- ContentActivity.Write : [contentActivity](https://learn.microsoft.com/en-us/graph/api/activitiescontainer-post-contentactivities)
- **Purview policies**: Configure Purview policies using the Microsoft Entra app ID to enable agent communications data to flow into Purview. For more information, see [Configure Microsoft Purview](https://learn.microsoft.com/purview/developer/configurepurview).
#### Scopes
`PurviewSettings.get_scopes()` derives the Graph scope list (currently `https://graph.microsoft.com/.default` style).
---
## Quick Start
```python
import asyncio
from agent_framework import Agent, Message, Role
from agent_framework.openai import OpenAIChatCompletionClient
from agent_framework.microsoft import PurviewPolicyMiddleware, PurviewSettings
from azure.identity import InteractiveBrowserCredential
async def main():
client = OpenAIChatCompletionClient() # uses environment for endpoint + deployment
purview_middleware = PurviewPolicyMiddleware(
credential=InteractiveBrowserCredential(),
settings=PurviewSettings(app_name="My Sample App")
)
agent = Agent(
client=client,
instructions="You are a helpful assistant.",
middleware=[purview_middleware]
)
response = await agent.run(Message("user", ["Summarize zero trust in one sentence."]))
print(response)
asyncio.run(main())
```
If a policy violation is detected on the prompt, the middleware terminates the run and substitutes a system message: `"Prompt blocked by policy"`. If on the response, the result becomes `"Response blocked by policy"`.
---
## Configuration
### `PurviewSettings`
```python
PurviewSettings(
app_name="My App", # Required: Display / logical name
app_version=None, # Optional: Version string of the application
tenant_id=None, # Optional: Tenant id (guid), used mainly for auth context
purview_app_location=None, # Optional: PurviewAppLocation for scoping
graph_base_uri="https://graph.microsoft.com/v1.0/",
blocked_prompt_message="Prompt blocked by policy", # Custom message for blocked prompts
blocked_response_message="Response blocked by policy", # Custom message for blocked responses
ignore_exceptions=False, # If True, non-payment exceptions are logged but not thrown
ignore_payment_required=False, # If True, 402 payment required errors are logged but not thrown
cache_ttl_seconds=14400, # Cache TTL in seconds (default 4 hours)
max_cache_size_bytes=200 * 1024 * 1024 # Max cache size in bytes (default 200MB)
)
```
### Caching
The Purview integration includes built-in caching for protection scopes responses to improve performance and reduce API calls:
- **Default TTL**: 4 hours (14400 seconds)
- **Default Cache Size**: 200MB
- **Cache Provider**: `InMemoryCacheProvider` is used by default, but you can provide a custom implementation via the `CacheProvider` protocol
- **Cache Invalidation**: Cache is automatically invalidated when protection scope state is modified
- **Exception Caching**: 402 Payment Required errors are cached to avoid repeated failed API calls
You can customize caching behavior in `PurviewSettings`:
```python
from agent_framework.microsoft import PurviewSettings
settings = PurviewSettings(
app_name="My App",
cache_ttl_seconds=14400, # 4 hours
max_cache_size_bytes=200 * 1024 * 1024 # 200MB
)
```
Or provide your own cache provider:
```python
from typing import Any
from agent_framework.microsoft import PurviewPolicyMiddleware, PurviewSettings, CacheProvider
from azure.identity import DefaultAzureCredential
class MyCustomCache(CacheProvider):
async def get(self, key: str) -> Any | None:
# Your implementation
pass
async def set(self, key: str, value: Any, ttl_seconds: int | None = None) -> None:
# Your implementation
pass
async def remove(self, key: str) -> None:
# Your implementation
pass
credential = DefaultAzureCredential()
settings = PurviewSettings(app_name="MyApp")
middleware = PurviewPolicyMiddleware(
credential=credential,
settings=settings,
cache_provider=MyCustomCache()
)
```
To scope evaluation by location (application, URL, or domain):
```python
from agent_framework.microsoft import (
PurviewAppLocation,
PurviewLocationType,
PurviewSettings,
)
settings = PurviewSettings(
app_name="Contoso Support",
purview_app_location=PurviewAppLocation(
location_type=PurviewLocationType.APPLICATION,
location_value="<app-client-id>"
)
)
```
### Customizing Blocked Messages
By default, when Purview blocks a prompt or response, the middleware returns a generic system message. You can customize these messages by providing your own text in the `PurviewSettings`:
```python
from agent_framework.microsoft import PurviewSettings
settings = PurviewSettings(
app_name="My App",
blocked_prompt_message="Your request contains content that violates our policies. Please rephrase and try again.",
blocked_response_message="The response was blocked due to policy restrictions. Please contact support if you need assistance."
)
```
### Exception Handling Controls
The Purview integration provides fine-grained control over exception handling to support graceful degradation scenarios:
```python
from agent_framework.microsoft import PurviewSettings
# Ignore all non-payment exceptions (continue execution even if policy check fails)
settings = PurviewSettings(
app_name="My App",
ignore_exceptions=True # Log errors but don't throw
)
# Ignore only 402 Payment Required errors (useful for tenants without proper licensing)
settings = PurviewSettings(
app_name="My App",
ignore_payment_required=True # Continue even without Purview Consumptive Billing Setup
)
# Both can be combined
settings = PurviewSettings(
app_name="My App",
ignore_exceptions=True,
ignore_payment_required=True
)
```
### Selecting Agent vs Chat Middleware
Use the agent middleware when you already have / want the full agent pipeline:
```python
from agent_framework import Agent
from agent_framework.openai import OpenAIChatCompletionClient
from agent_framework.microsoft import PurviewPolicyMiddleware, PurviewSettings
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
client = OpenAIChatCompletionClient()
agent = Agent(
client=client,
instructions="You are helpful.",
middleware=[PurviewPolicyMiddleware(credential, PurviewSettings(app_name="My App"))]
)
```
Use the chat middleware when you attach directly to a chat client (e.g. minimal agent shell or custom orchestration):
```python
import os
from agent_framework import Agent
from agent_framework.openai import OpenAIChatCompletionClient
from agent_framework.microsoft import PurviewChatPolicyMiddleware, PurviewSettings
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
client = OpenAIChatCompletionClient(
model=os.environ["AZURE_OPENAI_MODEL"],
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
credential=credential,
middleware=[
PurviewChatPolicyMiddleware(credential, PurviewSettings(app_name="My App (Chat)"))
],
)
agent = Agent(client=client, instructions="You are helpful.")
```
The policy logic is identical; the difference is only the hook point in the pipeline.
---
## Middleware Lifecycle
1. **Before agent execution** (`prompt phase`): all `context.messages` are evaluated.
- If no valid user_id is found, processing is skipped (no policy evaluation)
- Protection scopes are retrieved (with caching)
- Applicable scopes are checked to determine execution mode
- In inline mode: content is evaluated immediately
- In offline mode: evaluation is queued in background
2. **If blocked**: `context.result` is replaced with a system message and `context.terminate = True`.
3. **After successful agent execution** (`response phase`): the produced messages are evaluated using the same user_id from the prompt phase.
4. **If blocked**: result messages are replaced with a blocking notice.
The user identifier is discovered from `Message.additional_properties['user_id']` during the prompt phase and reused for the response phase, ensuring both evaluations map consistently to the same user. If no user_id is present, policy evaluation is skipped entirely.
You can customize the blocking messages using the `blocked_prompt_message` and `blocked_response_message` fields in `PurviewSettings`. For more advanced scenarios, you can wrap the middleware or post-process `context.result` in later middleware.
---
## Exceptions
| Exception | Scenario |
|-----------|----------|
| `PurviewPaymentRequiredError` | 402 Payment Required - tenant lacks proper Purview licensing or consumptive billing setup |
| `PurviewAuthenticationError` | Token acquisition / validation issues |
| `PurviewRateLimitError` | 429 responses from service |
| `PurviewRequestError` | 4xx client errors (bad input, unauthorized, forbidden) |
| `PurviewServiceError` | 5xx or unexpected service errors |
### Exception Handling
All exceptions inherit from `PurviewServiceError`. You can catch specific exceptions or use the base class:
```python
from agent_framework.microsoft import (
PurviewPaymentRequiredError,
PurviewAuthenticationError,
PurviewRateLimitError,
PurviewRequestError,
PurviewServiceError
)
try:
# Your code here
pass
except PurviewPaymentRequiredError as ex:
# Handle licensing issues specifically
print(f"Purview licensing required: {ex}")
except (PurviewAuthenticationError, PurviewRateLimitError, PurviewRequestError, PurviewServiceError) as ex:
# Handle other errors
print(f"Purview enforcement skipped: {ex}")
```
---
## Notes
- **User Identification**: Provide a `user_id` per request (e.g. in `Message(..., additional_properties={"user_id": "<guid>"})`) for per-user policy scoping. If no user_id is provided, policy evaluation is skipped entirely.
- **Blocking Messages**: Can be customized via `blocked_prompt_message` and `blocked_response_message` in `PurviewSettings`. By default, they are "Prompt blocked by policy" and "Response blocked by policy" respectively.
- **Streaming Responses**: Post-response policy evaluation presently applies only to non-streaming chat responses.
- **Error Handling**: Use `ignore_exceptions` and `ignore_payment_required` settings for graceful degradation. When enabled, errors are logged but don't fail the request.
- **Caching**: Protection scopes responses and 402 errors are cached by default with a 4-hour TTL. Cache is automatically invalidated when protection scope state changes.
- **Cold-cache parallelization**: On a `ProtectionScopes` cache miss, scopes are refreshed in the background while `ProcessContent` runs in the foreground.
- **Background Processing**: Content Activities and offline Process Content requests are handled asynchronously using background tasks to avoid blocking the main execution flow.