mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
ddfbdf5c7a
* Python: Information-flow control based prompt injection defense (#5024) * fides integration * documentation * documentation * documentation * human-approval on policy violation * numenous hyena 'works' * IFC based implementation * minor edits in documentation * rebasing the branch and running the email example * Add security tests for IFC middleware * Fix Role.TOOL NameError in approval handling * tiered labelling scheme * 3 tier labelling scheme in middleware * Adapt security middleware to list[Content] tool results * Refactor SecureAgentConfig as context provider and address Copilot review comments * Update FIDES docs to reflect context provider pattern and update code for ContextProvider rename * Fix security examples: use OpenAIChatClient instead of non-existent AzureOpenAIChatClient * Address PR review: consolidate security modules, remove ContentLineage, update docs * remove unrelated files * remove comment from _tools.py and rename decision file * Fix CI failures: Bandit B110, broken md links, hosted approval passthrough * apply template to decision doc 0024 * minor fixes to decision doc 0024 --------- Co-authored-by: Aashish <t-akolluri@microsoft.com> * Python: follow up FIDES security flow (#5330) * Python: follow up FIDES security flow Refine the secure approval path, mark the security classes with the FIDES experimental feature label, and clean up the related docs/tests. Also fix workspace-level validation regressions uncovered while running the full Python check suite. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: remove FIDES GitHub MCP sample Drop the GitHub MCP security sample from the FIDES follow-up branch while keeping the remaining security docs and samples intact. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review: fix paths and update FIDES implementation (#5352) * Python: updated import naming and comment from review (#5421) * updated import naming and comment from review * Add approval replay None call-id test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Address PR 5331 comments and track sesssion while calling Agent in email_security_example (#5446) * Address PR review: fix paths and update FIDES implementation * Address PR comments and add session tracking in email example in samples * Fix session creation and resolve merge conflict in docstring example * Resolve merge conflict in docstring example * Python: add test for empty-message pruning in approval result replacement (#5617) Adds test coverage for the second-pass logic in `_replace_approval_contents_with_results` that removes messages whose `contents` list becomes empty after first-pass content removal. Addresses review comment on PR #5331: https://github.com/microsoft/agent-framework/pull/5331#discussion_r3129039445 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: shrutitople <shruti.tople@gmail.com> Co-authored-by: Aashish <t-akolluri@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
353 lines
14 KiB
Markdown
353 lines
14 KiB
Markdown
# FIDES Implementation Summary
|
|
|
|
## Overview
|
|
|
|
**FIDES** is a comprehensive deterministic prompt injection defense system for the agent framework. The implementation provides label-based security mechanisms to defend against prompt injection attacks by tracking integrity and confidentiality of content throughout agent execution.
|
|
|
|
**🚀 Key Features:**
|
|
- **Context Provider Pattern** - `SecureAgentConfig` extends `ContextProvider`, injecting tools, instructions, and middleware automatically
|
|
- **Automatic Variable Hiding** - UNTRUSTED content is automatically hidden without requiring manual intervention
|
|
- **Per-Item Embedded Labels** - Tools return `list[Content]` with `Content.from_text()` for proper label propagation
|
|
- **SecureAgentConfig** - One-line secure agent configuration via `context_providers=[config]`
|
|
- **Data Exfiltration Prevention** - `max_allowed_confidentiality` prevents sensitive data leakage
|
|
- **Message-Level Label Tracking** (Phase 1) - Track labels on every message in the conversation
|
|
|
|
## Architecture Components
|
|
|
|
The FIDES defense system consists of seven main components:
|
|
|
|
1. **Content Labeling Infrastructure** - Labels for tracking integrity and confidentiality
|
|
2. **Label Tracking Middleware** - Automatically assigns, propagates labels, and hides untrusted content
|
|
3. **Per-Item Embedded Labels** - Tools can return mixed-trust data with per-item security labels
|
|
4. **Policy Enforcement Middleware** - Blocks tool calls that violate security policies
|
|
5. **Security Tools** - Specialized tools for safe handling of untrusted content (`quarantined_llm`, `inspect_variable`)
|
|
6. **SecureAgentConfig** - Context provider for easy secure agent configuration
|
|
7. **Message-Level Label Tracking** - Track labels on every message in the conversation (Phase 1)
|
|
|
|
## Implementation Details
|
|
|
|
### Files Created
|
|
|
|
1. **`python/packages/core/agent_framework/security.py`** (~2950 lines — all security primitives, middleware, tools, and configuration in a single public module)
|
|
- `IntegrityLabel` enum (TRUSTED/UNTRUSTED)
|
|
- `ConfidentialityLabel` enum (PUBLIC/PRIVATE/USER_IDENTITY)
|
|
- `ContentLabel` class with serialization support
|
|
- `combine_labels()` function for label composition
|
|
- `ContentVariableStore` for client-side content storage
|
|
- `VariableReferenceContent` for variable indirection
|
|
- `LabeledMessage` class (inherits from `Message`) for message-level tracking
|
|
- `check_confidentiality_allowed()` helper for data exfiltration prevention
|
|
- `LabelTrackingFunctionMiddleware` - Tracks and propagates security labels
|
|
- `PolicyEnforcementFunctionMiddleware` - Enforces security policies
|
|
- `SecureAgentConfig` extends `ContextProvider` - automatic secure agent configuration
|
|
- `quarantined_llm()` - Isolated LLM calls with labeled data
|
|
- `inspect_variable()` - Controlled variable content inspection
|
|
- `store_untrusted_content()` - Helper for manual variable indirection (legacy)
|
|
- `get_security_tools()` - Returns list of security tools
|
|
- `SECURITY_TOOL_INSTRUCTIONS` - Detailed guidance for agents
|
|
|
|
|
|
2. **`FIDES_DEVELOPER_GUIDE.md`** (~1250 lines)
|
|
- Located at `python/samples/02-agents/security/FIDES_DEVELOPER_GUIDE.md`
|
|
- Complete documentation of the FIDES security system
|
|
- Architecture overview and design rationale
|
|
- Usage examples (6+ comprehensive scenarios)
|
|
- Best practices and configuration options
|
|
- API reference with full parameter documentation
|
|
- Data exfiltration prevention documentation
|
|
|
|
3. **`python/packages/core/tests/test_security.py`** (~800+ lines)
|
|
- Unit tests for ContentLabel and label operations
|
|
- Tests for ContentVariableStore functionality
|
|
- Tests for VariableReferenceContent
|
|
- Middleware behavior tests (label tracking and policy enforcement)
|
|
- Automatic hiding tests
|
|
- Per-item embedded label tests
|
|
- Context label tracking tests
|
|
- Message-level tracking tests (Phase 1)
|
|
- Data exfiltration prevention tests
|
|
|
|
4. **`docs/decisions/0024-prompt-injection-defense.md`**
|
|
- Architecture Decision Record (ADR)
|
|
- Design rationale and alternatives considered
|
|
- Security properties and guarantees
|
|
|
|
5. **`python/samples/02-agents/security/README.md`**
|
|
- Sample-focused entry point for the two runnable FIDES security samples
|
|
- Prerequisites, run commands, and links to the developer guide for deeper details
|
|
|
|
### Files Modified
|
|
|
|
1. **`python/packages/core/agent_framework/__init__.py`**
|
|
- Removed root-level security exports so `agent_framework.security` is the canonical import surface
|
|
|
|
## Core Features
|
|
|
|
### 1. Content Labeling Infrastructure
|
|
|
|
- **IntegrityLabel**: TRUSTED (user input) vs UNTRUSTED (AI-generated, external)
|
|
- **ConfidentialityLabel**: PUBLIC, PRIVATE, USER_IDENTITY
|
|
- **Label Combination**: Most restrictive policy (UNTRUSTED + metadata merging)
|
|
- **Serialization**: Full support for `to_dict()` and `from_dict()`
|
|
|
|
### 2. Per-Item Embedded Labels
|
|
|
|
Tools returning mixed-trust data embed labels on individual items using `Content.from_text()`:
|
|
|
|
```python
|
|
import json
|
|
from agent_framework import Content, tool
|
|
|
|
@tool(description="Fetch emails from inbox")
|
|
async def fetch_emails(count: int = 5) -> list[Content]:
|
|
return [
|
|
Content.from_text(
|
|
json.dumps({
|
|
"id": email["id"],
|
|
"body": email["body"],
|
|
}),
|
|
additional_properties={
|
|
"security_label": {
|
|
"integrity": "trusted" if email["internal"] else "untrusted",
|
|
"confidentiality": "private",
|
|
}
|
|
),
|
|
)
|
|
for email in emails
|
|
]
|
|
```
|
|
|
|
These embedded labels are automatically consumed by `LabelTrackingFunctionMiddleware`, which:
|
|
- Extracts the `security_label` from `additional_properties`
|
|
- Uses the embedded label as the highest-priority source for that item
|
|
- Automatically hides UNTRUSTED items in the variable store
|
|
- Replaces hidden items with `VariableReferenceContent` in the LLM context
|
|
- Preserves TRUSTED items visible to the LLM without tainting the context label
|
|
|
|
This enables tools to return mixed-trust data where some items (internal emails) remain visible while untrusted items (external emails) are automatically hidden without manual intervention.
|
|
},
|
|
)
|
|
for email in emails
|
|
]
|
|
```
|
|
|
|
### 3. Automatic Variable Hiding
|
|
|
|
This feature automatically hides any UNTRUSTED content returned by tools while keeping the hiding logic transparent to the developer. Developers do not need to manually call `store_untrusted_content()`. This allows the LLM /agent's context to remain clean and secure. Key aspects include:
|
|
|
|
- **Automatic Detection**: Middleware checks integrity label after each tool call
|
|
- **Automatic Storage**: UNTRUSTED results/items stored in variable store
|
|
- **Transparent Replacement**: LLM context receives `VariableReferenceContent`
|
|
- **Context Label Protection**: Hidden content does NOT taint context label
|
|
|
|
### 4. Context Label Tracking
|
|
|
|
- Context label starts as TRUSTED + PUBLIC
|
|
- Gets updated (tainted) when non-hidden untrusted content enters context
|
|
- Policy enforcement uses context label for validation
|
|
- Provides `get_context_label()` and `reset_context_label()` methods
|
|
|
|
### 5. Data Exfiltration Prevention
|
|
|
|
Tools declare `max_allowed_confidentiality` to prevent sensitive data leakage:
|
|
|
|
```python
|
|
@tool(
|
|
description="Post to public Slack channel",
|
|
additional_properties={
|
|
"max_allowed_confidentiality": "public", # Blocks PRIVATE data
|
|
}
|
|
)
|
|
async def post_to_slack(channel: str, message: str) -> dict:
|
|
return {"status": "posted"}
|
|
```
|
|
|
|
### 6. SecureAgentConfig (Context Provider)
|
|
|
|
SecureAgentConfig extends `ContextProvider` for automatic secure agent configuration:
|
|
|
|
```python
|
|
config = SecureAgentConfig(
|
|
auto_hide_untrusted=True,
|
|
allow_untrusted_tools={"search_web", "fetch_data"},
|
|
block_on_violation=True,
|
|
quarantine_chat_client=quarantine_client, # Optional: real LLM for quarantine
|
|
)
|
|
|
|
# Context provider injects tools, instructions, and middleware automatically
|
|
agent = Agent(
|
|
client=client,
|
|
name="secure_assistant",
|
|
instructions="You are a helpful assistant.",
|
|
tools=[my_tool],
|
|
context_providers=[config], # That's it!
|
|
)
|
|
```
|
|
|
|
## Security Properties
|
|
|
|
### Deterministic Defense
|
|
|
|
1. **Tiered label propagation**: Every tool result receives a label via 3-tier priority (embedded > source_integrity > input labels join)
|
|
2. **Context tracking**: Cumulative security state tracked across turns
|
|
3. **Policy enforcement**: Violations blocked before execution
|
|
4. **Content isolation**: Untrusted content stored as variables
|
|
5. **Taint propagation**: Once context becomes UNTRUSTED, it stays UNTRUSTED
|
|
6. **Data exfiltration prevention**: `max_allowed_confidentiality` gates output destinations
|
|
7. **Audit trail**: All security events logged
|
|
8. **No runtime guessing**: Deterministic label assignment
|
|
|
|
### Attack Prevention
|
|
|
|
- **Direct prompt injection**: Variables hide actual content from LLM
|
|
- **Indirect prompt injection**: Labels track untrusted AI-generated calls
|
|
- **Privilege escalation**: Policy blocks untrusted calls to privileged tools
|
|
- **Data exfiltration**: Confidentiality labels + `max_allowed_confidentiality` enforced
|
|
- **Tool misuse**: Only whitelisted tools accept untrusted inputs
|
|
|
|
## Configuration Options
|
|
|
|
### LabelTrackingFunctionMiddleware
|
|
- `default_integrity`: Default label for unknown sources
|
|
- `default_confidentiality`: Default confidentiality level
|
|
- `auto_hide_untrusted`: Enable automatic variable hiding (default: True)
|
|
- `hide_threshold`: Integrity level at which hiding occurs (default: UNTRUSTED)
|
|
|
|
### PolicyEnforcementFunctionMiddleware
|
|
- `allow_untrusted_tools`: Set of tools accepting untrusted inputs
|
|
- `block_on_violation`: Block vs warn on violations
|
|
- `enable_audit_log`: Enable/disable audit logging
|
|
|
|
### Tool Metadata (via `additional_properties`)
|
|
- `confidentiality`: Tool's output confidentiality level
|
|
- `source_integrity`: Fallback integrity for unlabeled results (data-producing tools only)
|
|
- `accepts_untrusted`: Explicit untrusted input permission
|
|
- `max_allowed_confidentiality`: Maximum allowed input confidentiality (for sink tools)
|
|
- `requires_approval`: Human-in-the-loop requirement
|
|
|
|
## Usage Pattern
|
|
|
|
### Recommended: SecureAgentConfig as Context Provider
|
|
|
|
```python
|
|
from agent_framework.security import SecureAgentConfig
|
|
|
|
config = SecureAgentConfig(
|
|
auto_hide_untrusted=True,
|
|
allow_untrusted_tools={"search_web"},
|
|
block_on_violation=True,
|
|
)
|
|
|
|
# Context provider injects everything automatically
|
|
agent = Agent(
|
|
client=client,
|
|
name="secure_assistant",
|
|
instructions="You are a helpful assistant.",
|
|
tools=[search_web],
|
|
context_providers=[config], # Tools, instructions, and middleware injected via before_run()
|
|
)
|
|
```
|
|
|
|
### Processing Hidden Content with quarantined_llm
|
|
|
|
```python
|
|
from agent_framework.security import quarantined_llm
|
|
|
|
# Agent automatically uses quarantined_llm with variable_ids
|
|
result = await quarantined_llm(
|
|
prompt="Summarize this data",
|
|
variable_ids=["var_abc123"] # Reference hidden content by ID
|
|
)
|
|
```
|
|
|
|
## Testing
|
|
|
|
Comprehensive test suite with:
|
|
- 115+ unit tests covering all components
|
|
- Label creation, serialization, combination
|
|
- Variable store operations
|
|
- Middleware behavior (tracking and enforcement)
|
|
- Automatic hiding with per-item labels
|
|
- Context label tracking
|
|
- Message-level tracking (Phase 1)
|
|
- Data exfiltration prevention
|
|
- Policy violation scenarios
|
|
- Audit log verification
|
|
|
|
Run tests:
|
|
```bash
|
|
cd python/packages/core && ../../.venv/bin/pytest tests/test_security.py -v
|
|
```
|
|
|
|
## Code Statistics
|
|
|
|
- **Total lines**: ~2,950+ lines (single `security.py` module)
|
|
- **New modules**: 1 (`security.py` — consolidated from 3 original modules)
|
|
- **Total tests**: 115+ unit tests
|
|
- **Documentation**: 1,250+ lines in developer guide
|
|
- **Examples**: 6+ comprehensive scenarios
|
|
|
|
## Deliverables Checklist
|
|
|
|
### Core Implementation
|
|
✅ ContentLabel infrastructure with integrity and confidentiality
|
|
✅ ContentVariableStore for variable indirection
|
|
✅ VariableReferenceContent for safe context references
|
|
✅ LabelTrackingFunctionMiddleware for automatic labeling
|
|
✅ PolicyEnforcementFunctionMiddleware for policy enforcement
|
|
✅ quarantined_llm tool for isolated processing
|
|
✅ inspect_variable tool for controlled content access
|
|
✅ store_untrusted_content helper for manual variable indirection
|
|
|
|
### Automatic Hiding Enhancement
|
|
✅ Auto-hide UNTRUSTED content with `auto_hide_untrusted` flag
|
|
✅ Per-middleware ContentVariableStore instances
|
|
✅ Thread-local storage for middleware access from tools
|
|
✅ Automatic UNTRUSTED content replacement
|
|
|
|
### Per-Item Embedded Labels
|
|
✅ Support for `additional_properties.security_label` on individual items
|
|
✅ Mixed-trust data handling (hide untrusted, keep trusted visible)
|
|
✅ Fallback to `source_integrity` for unlabeled items
|
|
|
|
### Context Label Tracking
|
|
✅ Cumulative context label tracking across turns
|
|
✅ Hidden content does NOT taint context
|
|
✅ `get_context_label()` and `reset_context_label()` methods
|
|
✅ Policy enforcement uses context label
|
|
|
|
### Data Exfiltration Prevention
|
|
✅ `max_allowed_confidentiality` tool property
|
|
✅ `check_confidentiality_allowed()` helper function
|
|
✅ Policy enforcement validates confidentiality flow
|
|
|
|
### SecureAgentConfig
|
|
✅ Context provider pattern with `ContextProvider` base class
|
|
✅ `before_run()` hook for automatic injection of tools, instructions, and middleware
|
|
✅ One-line secure agent configuration via `context_providers=[config]`
|
|
✅ `get_tools()`, `get_instructions()`, `get_middleware()` methods (for manual use)
|
|
✅ `quarantine_chat_client` support for real LLM calls
|
|
✅ `SECURITY_TOOL_INSTRUCTIONS` constant
|
|
|
|
### Documentation & Testing
|
|
✅ Complete FIDES Developer Guide (~1250 lines)
|
|
✅ Architecture Decision Record (ADR)
|
|
✅ Quick Start Guide
|
|
✅ Comprehensive test suite (115+ tests)
|
|
✅ Example code with 6+ scenarios
|
|
✅ 3 complete security examples (email, repo confidentiality, GitHub MCP labels)
|
|
|
|
## Summary
|
|
|
|
**FIDES** provides a comprehensive, deterministic defense against prompt injection attacks with:
|
|
|
|
- **Zero-effort protection**: Automatic variable hiding for developers
|
|
- **Context provider pattern**: `SecureAgentConfig` extends `ContextProvider` for automatic setup
|
|
- **Granular control**: Per-item embedded labels via `Content.from_text()` for mixed-trust data
|
|
- **Easy configuration**: `SecureAgentConfig` for one-line setup
|
|
- **Data safety**: Exfiltration prevention via confidentiality gates
|
|
- **Full traceability**: Message-level label tracking
|
|
- **Complete auditability**: All security events logged
|
|
|
|
The system ensures that untrusted content never directly reaches the LLM context and that all tool calls are policy-checked based on the cumulative security state before execution.
|