Files
agent-framework/docs/features/FIDES_IMPLEMENTATION_SUMMARY.md
Eduard van Valkenburg ddfbdf5c7a Python: information-flow control prompt injection defense (#5331)
* Python: Information-flow control based prompt injection defense (#5024)

* fides integration

* documentation

* documentation

* documentation

* human-approval on policy violation

* numenous hyena 'works'

* IFC based implementation

* minor edits in documentation

* rebasing the branch and running the email example

* Add security tests for IFC middleware

* Fix Role.TOOL NameError in approval handling

* tiered labelling scheme

* 3 tier labelling scheme in middleware

* Adapt security middleware to list[Content] tool results

* Refactor SecureAgentConfig as context provider and address Copilot review comments

* Update FIDES docs to reflect context provider pattern and update code for ContextProvider rename

* Fix security examples: use OpenAIChatClient instead of non-existent AzureOpenAIChatClient

* Address PR review: consolidate security modules, remove ContentLineage, update docs

* remove unrelated files

* remove comment from _tools.py and rename decision file

* Fix CI failures: Bandit B110, broken md links, hosted approval passthrough

* apply template to decision doc 0024

* minor fixes to decision doc 0024

---------

Co-authored-by: Aashish <t-akolluri@microsoft.com>

* Python: follow up FIDES security flow (#5330)

* Python: follow up FIDES security flow

Refine the secure approval path, mark the security classes with the FIDES experimental feature label, and clean up the related docs/tests. Also fix workspace-level validation regressions uncovered while running the full Python check suite.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: remove FIDES GitHub MCP sample

Drop the GitHub MCP security sample from the FIDES follow-up branch while keeping the remaining security docs and samples intact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: fix paths and update FIDES implementation (#5352)

* Python: updated import naming and comment from review (#5421)

* updated import naming and comment from review

* Add approval replay None call-id test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Address PR 5331 comments and track sesssion while calling Agent in email_security_example (#5446)

* Address PR review: fix paths and update FIDES implementation

* Address PR comments and add session tracking in email example in samples

* Fix session creation and resolve merge conflict in docstring example

* Resolve merge conflict in docstring example

* Python: add test for empty-message pruning in approval result replacement (#5617)

Adds test coverage for the second-pass logic in
`_replace_approval_contents_with_results` that removes messages whose
`contents` list becomes empty after first-pass content removal.

Addresses review comment on PR #5331:
https://github.com/microsoft/agent-framework/pull/5331#discussion_r3129039445

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: shrutitople <shruti.tople@gmail.com>
Co-authored-by: Aashish <t-akolluri@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 18:08:08 +00:00

353 lines
14 KiB
Markdown

# FIDES Implementation Summary
## Overview
**FIDES** is a comprehensive deterministic prompt injection defense system for the agent framework. The implementation provides label-based security mechanisms to defend against prompt injection attacks by tracking integrity and confidentiality of content throughout agent execution.
**🚀 Key Features:**
- **Context Provider Pattern** - `SecureAgentConfig` extends `ContextProvider`, injecting tools, instructions, and middleware automatically
- **Automatic Variable Hiding** - UNTRUSTED content is automatically hidden without requiring manual intervention
- **Per-Item Embedded Labels** - Tools return `list[Content]` with `Content.from_text()` for proper label propagation
- **SecureAgentConfig** - One-line secure agent configuration via `context_providers=[config]`
- **Data Exfiltration Prevention** - `max_allowed_confidentiality` prevents sensitive data leakage
- **Message-Level Label Tracking** (Phase 1) - Track labels on every message in the conversation
## Architecture Components
The FIDES defense system consists of seven main components:
1. **Content Labeling Infrastructure** - Labels for tracking integrity and confidentiality
2. **Label Tracking Middleware** - Automatically assigns, propagates labels, and hides untrusted content
3. **Per-Item Embedded Labels** - Tools can return mixed-trust data with per-item security labels
4. **Policy Enforcement Middleware** - Blocks tool calls that violate security policies
5. **Security Tools** - Specialized tools for safe handling of untrusted content (`quarantined_llm`, `inspect_variable`)
6. **SecureAgentConfig** - Context provider for easy secure agent configuration
7. **Message-Level Label Tracking** - Track labels on every message in the conversation (Phase 1)
## Implementation Details
### Files Created
1. **`python/packages/core/agent_framework/security.py`** (~2950 lines — all security primitives, middleware, tools, and configuration in a single public module)
- `IntegrityLabel` enum (TRUSTED/UNTRUSTED)
- `ConfidentialityLabel` enum (PUBLIC/PRIVATE/USER_IDENTITY)
- `ContentLabel` class with serialization support
- `combine_labels()` function for label composition
- `ContentVariableStore` for client-side content storage
- `VariableReferenceContent` for variable indirection
- `LabeledMessage` class (inherits from `Message`) for message-level tracking
- `check_confidentiality_allowed()` helper for data exfiltration prevention
- `LabelTrackingFunctionMiddleware` - Tracks and propagates security labels
- `PolicyEnforcementFunctionMiddleware` - Enforces security policies
- `SecureAgentConfig` extends `ContextProvider` - automatic secure agent configuration
- `quarantined_llm()` - Isolated LLM calls with labeled data
- `inspect_variable()` - Controlled variable content inspection
- `store_untrusted_content()` - Helper for manual variable indirection (legacy)
- `get_security_tools()` - Returns list of security tools
- `SECURITY_TOOL_INSTRUCTIONS` - Detailed guidance for agents
2. **`FIDES_DEVELOPER_GUIDE.md`** (~1250 lines)
- Located at `python/samples/02-agents/security/FIDES_DEVELOPER_GUIDE.md`
- Complete documentation of the FIDES security system
- Architecture overview and design rationale
- Usage examples (6+ comprehensive scenarios)
- Best practices and configuration options
- API reference with full parameter documentation
- Data exfiltration prevention documentation
3. **`python/packages/core/tests/test_security.py`** (~800+ lines)
- Unit tests for ContentLabel and label operations
- Tests for ContentVariableStore functionality
- Tests for VariableReferenceContent
- Middleware behavior tests (label tracking and policy enforcement)
- Automatic hiding tests
- Per-item embedded label tests
- Context label tracking tests
- Message-level tracking tests (Phase 1)
- Data exfiltration prevention tests
4. **`docs/decisions/0024-prompt-injection-defense.md`**
- Architecture Decision Record (ADR)
- Design rationale and alternatives considered
- Security properties and guarantees
5. **`python/samples/02-agents/security/README.md`**
- Sample-focused entry point for the two runnable FIDES security samples
- Prerequisites, run commands, and links to the developer guide for deeper details
### Files Modified
1. **`python/packages/core/agent_framework/__init__.py`**
- Removed root-level security exports so `agent_framework.security` is the canonical import surface
## Core Features
### 1. Content Labeling Infrastructure
- **IntegrityLabel**: TRUSTED (user input) vs UNTRUSTED (AI-generated, external)
- **ConfidentialityLabel**: PUBLIC, PRIVATE, USER_IDENTITY
- **Label Combination**: Most restrictive policy (UNTRUSTED + metadata merging)
- **Serialization**: Full support for `to_dict()` and `from_dict()`
### 2. Per-Item Embedded Labels
Tools returning mixed-trust data embed labels on individual items using `Content.from_text()`:
```python
import json
from agent_framework import Content, tool
@tool(description="Fetch emails from inbox")
async def fetch_emails(count: int = 5) -> list[Content]:
return [
Content.from_text(
json.dumps({
"id": email["id"],
"body": email["body"],
}),
additional_properties={
"security_label": {
"integrity": "trusted" if email["internal"] else "untrusted",
"confidentiality": "private",
}
),
)
for email in emails
]
```
These embedded labels are automatically consumed by `LabelTrackingFunctionMiddleware`, which:
- Extracts the `security_label` from `additional_properties`
- Uses the embedded label as the highest-priority source for that item
- Automatically hides UNTRUSTED items in the variable store
- Replaces hidden items with `VariableReferenceContent` in the LLM context
- Preserves TRUSTED items visible to the LLM without tainting the context label
This enables tools to return mixed-trust data where some items (internal emails) remain visible while untrusted items (external emails) are automatically hidden without manual intervention.
},
)
for email in emails
]
```
### 3. Automatic Variable Hiding
This feature automatically hides any UNTRUSTED content returned by tools while keeping the hiding logic transparent to the developer. Developers do not need to manually call `store_untrusted_content()`. This allows the LLM /agent's context to remain clean and secure. Key aspects include:
- **Automatic Detection**: Middleware checks integrity label after each tool call
- **Automatic Storage**: UNTRUSTED results/items stored in variable store
- **Transparent Replacement**: LLM context receives `VariableReferenceContent`
- **Context Label Protection**: Hidden content does NOT taint context label
### 4. Context Label Tracking
- Context label starts as TRUSTED + PUBLIC
- Gets updated (tainted) when non-hidden untrusted content enters context
- Policy enforcement uses context label for validation
- Provides `get_context_label()` and `reset_context_label()` methods
### 5. Data Exfiltration Prevention
Tools declare `max_allowed_confidentiality` to prevent sensitive data leakage:
```python
@tool(
description="Post to public Slack channel",
additional_properties={
"max_allowed_confidentiality": "public", # Blocks PRIVATE data
}
)
async def post_to_slack(channel: str, message: str) -> dict:
return {"status": "posted"}
```
### 6. SecureAgentConfig (Context Provider)
SecureAgentConfig extends `ContextProvider` for automatic secure agent configuration:
```python
config = SecureAgentConfig(
auto_hide_untrusted=True,
allow_untrusted_tools={"search_web", "fetch_data"},
block_on_violation=True,
quarantine_chat_client=quarantine_client, # Optional: real LLM for quarantine
)
# Context provider injects tools, instructions, and middleware automatically
agent = Agent(
client=client,
name="secure_assistant",
instructions="You are a helpful assistant.",
tools=[my_tool],
context_providers=[config], # That's it!
)
```
## Security Properties
### Deterministic Defense
1. **Tiered label propagation**: Every tool result receives a label via 3-tier priority (embedded > source_integrity > input labels join)
2. **Context tracking**: Cumulative security state tracked across turns
3. **Policy enforcement**: Violations blocked before execution
4. **Content isolation**: Untrusted content stored as variables
5. **Taint propagation**: Once context becomes UNTRUSTED, it stays UNTRUSTED
6. **Data exfiltration prevention**: `max_allowed_confidentiality` gates output destinations
7. **Audit trail**: All security events logged
8. **No runtime guessing**: Deterministic label assignment
### Attack Prevention
- **Direct prompt injection**: Variables hide actual content from LLM
- **Indirect prompt injection**: Labels track untrusted AI-generated calls
- **Privilege escalation**: Policy blocks untrusted calls to privileged tools
- **Data exfiltration**: Confidentiality labels + `max_allowed_confidentiality` enforced
- **Tool misuse**: Only whitelisted tools accept untrusted inputs
## Configuration Options
### LabelTrackingFunctionMiddleware
- `default_integrity`: Default label for unknown sources
- `default_confidentiality`: Default confidentiality level
- `auto_hide_untrusted`: Enable automatic variable hiding (default: True)
- `hide_threshold`: Integrity level at which hiding occurs (default: UNTRUSTED)
### PolicyEnforcementFunctionMiddleware
- `allow_untrusted_tools`: Set of tools accepting untrusted inputs
- `block_on_violation`: Block vs warn on violations
- `enable_audit_log`: Enable/disable audit logging
### Tool Metadata (via `additional_properties`)
- `confidentiality`: Tool's output confidentiality level
- `source_integrity`: Fallback integrity for unlabeled results (data-producing tools only)
- `accepts_untrusted`: Explicit untrusted input permission
- `max_allowed_confidentiality`: Maximum allowed input confidentiality (for sink tools)
- `requires_approval`: Human-in-the-loop requirement
## Usage Pattern
### Recommended: SecureAgentConfig as Context Provider
```python
from agent_framework.security import SecureAgentConfig
config = SecureAgentConfig(
auto_hide_untrusted=True,
allow_untrusted_tools={"search_web"},
block_on_violation=True,
)
# Context provider injects everything automatically
agent = Agent(
client=client,
name="secure_assistant",
instructions="You are a helpful assistant.",
tools=[search_web],
context_providers=[config], # Tools, instructions, and middleware injected via before_run()
)
```
### Processing Hidden Content with quarantined_llm
```python
from agent_framework.security import quarantined_llm
# Agent automatically uses quarantined_llm with variable_ids
result = await quarantined_llm(
prompt="Summarize this data",
variable_ids=["var_abc123"] # Reference hidden content by ID
)
```
## Testing
Comprehensive test suite with:
- 115+ unit tests covering all components
- Label creation, serialization, combination
- Variable store operations
- Middleware behavior (tracking and enforcement)
- Automatic hiding with per-item labels
- Context label tracking
- Message-level tracking (Phase 1)
- Data exfiltration prevention
- Policy violation scenarios
- Audit log verification
Run tests:
```bash
cd python/packages/core && ../../.venv/bin/pytest tests/test_security.py -v
```
## Code Statistics
- **Total lines**: ~2,950+ lines (single `security.py` module)
- **New modules**: 1 (`security.py` — consolidated from 3 original modules)
- **Total tests**: 115+ unit tests
- **Documentation**: 1,250+ lines in developer guide
- **Examples**: 6+ comprehensive scenarios
## Deliverables Checklist
### Core Implementation
✅ ContentLabel infrastructure with integrity and confidentiality
✅ ContentVariableStore for variable indirection
✅ VariableReferenceContent for safe context references
✅ LabelTrackingFunctionMiddleware for automatic labeling
✅ PolicyEnforcementFunctionMiddleware for policy enforcement
✅ quarantined_llm tool for isolated processing
✅ inspect_variable tool for controlled content access
✅ store_untrusted_content helper for manual variable indirection
### Automatic Hiding Enhancement
✅ Auto-hide UNTRUSTED content with `auto_hide_untrusted` flag
✅ Per-middleware ContentVariableStore instances
✅ Thread-local storage for middleware access from tools
✅ Automatic UNTRUSTED content replacement
### Per-Item Embedded Labels
✅ Support for `additional_properties.security_label` on individual items
✅ Mixed-trust data handling (hide untrusted, keep trusted visible)
✅ Fallback to `source_integrity` for unlabeled items
### Context Label Tracking
✅ Cumulative context label tracking across turns
✅ Hidden content does NOT taint context
✅ `get_context_label()` and `reset_context_label()` methods
✅ Policy enforcement uses context label
### Data Exfiltration Prevention
✅ `max_allowed_confidentiality` tool property
✅ `check_confidentiality_allowed()` helper function
✅ Policy enforcement validates confidentiality flow
### SecureAgentConfig
✅ Context provider pattern with `ContextProvider` base class
✅ `before_run()` hook for automatic injection of tools, instructions, and middleware
✅ One-line secure agent configuration via `context_providers=[config]`
✅ `get_tools()`, `get_instructions()`, `get_middleware()` methods (for manual use)
✅ `quarantine_chat_client` support for real LLM calls
✅ `SECURITY_TOOL_INSTRUCTIONS` constant
### Documentation & Testing
✅ Complete FIDES Developer Guide (~1250 lines)
✅ Architecture Decision Record (ADR)
✅ Quick Start Guide
✅ Comprehensive test suite (115+ tests)
✅ Example code with 6+ scenarios
✅ 3 complete security examples (email, repo confidentiality, GitHub MCP labels)
## Summary
**FIDES** provides a comprehensive, deterministic defense against prompt injection attacks with:
- **Zero-effort protection**: Automatic variable hiding for developers
- **Context provider pattern**: `SecureAgentConfig` extends `ContextProvider` for automatic setup
- **Granular control**: Per-item embedded labels via `Content.from_text()` for mixed-trust data
- **Easy configuration**: `SecureAgentConfig` for one-line setup
- **Data safety**: Exfiltration prevention via confidentiality gates
- **Full traceability**: Message-level label tracking
- **Complete auditability**: All security events logged
The system ensures that untrusted content never directly reaches the LLM context and that all tool calls are policy-checked based on the cumulative security state before execution.