# FIDES Implementation Summary ## Overview **FIDES** is a comprehensive deterministic prompt injection defense system for the agent framework. The implementation provides label-based security mechanisms to defend against prompt injection attacks by tracking integrity and confidentiality of content throughout agent execution. **🚀 Key Features:** - **Context Provider Pattern** - `SecureAgentConfig` extends `ContextProvider`, injecting tools, instructions, and middleware automatically - **Automatic Variable Hiding** - UNTRUSTED content is automatically hidden without requiring manual intervention - **Per-Item Embedded Labels** - Tools return `list[Content]` with `Content.from_text()` for proper label propagation - **SecureAgentConfig** - One-line secure agent configuration via `context_providers=[config]` - **Data Exfiltration Prevention** - `max_allowed_confidentiality` prevents sensitive data leakage - **Message-Level Label Tracking** (Phase 1) - Track labels on every message in the conversation ## Architecture Components The FIDES defense system consists of seven main components: 1. **Content Labeling Infrastructure** - Labels for tracking integrity and confidentiality 2. **Label Tracking Middleware** - Automatically assigns, propagates labels, and hides untrusted content 3. **Per-Item Embedded Labels** - Tools can return mixed-trust data with per-item security labels 4. **Policy Enforcement Middleware** - Blocks tool calls that violate security policies 5. **Security Tools** - Specialized tools for safe handling of untrusted content (`quarantined_llm`, `inspect_variable`) 6. **SecureAgentConfig** - Context provider for easy secure agent configuration 7. **Message-Level Label Tracking** - Track labels on every message in the conversation (Phase 1) ## Implementation Details ### Files Created 1. **`python/packages/core/agent_framework/security.py`** (~2950 lines — all security primitives, middleware, tools, and configuration in a single public module) - `IntegrityLabel` enum (TRUSTED/UNTRUSTED) - `ConfidentialityLabel` enum (PUBLIC/PRIVATE/USER_IDENTITY) - `ContentLabel` class with serialization support - `combine_labels()` function for label composition - `ContentVariableStore` for client-side content storage - `VariableReferenceContent` for variable indirection - `LabeledMessage` class (inherits from `Message`) for message-level tracking - `check_confidentiality_allowed()` helper for data exfiltration prevention - `LabelTrackingFunctionMiddleware` - Tracks and propagates security labels - `PolicyEnforcementFunctionMiddleware` - Enforces security policies - `SecureAgentConfig` extends `ContextProvider` - automatic secure agent configuration - `quarantined_llm()` - Isolated LLM calls with labeled data - `inspect_variable()` - Controlled variable content inspection - `store_untrusted_content()` - Helper for manual variable indirection (legacy) - `get_security_tools()` - Returns list of security tools - `SECURITY_TOOL_INSTRUCTIONS` - Detailed guidance for agents 2. **`FIDES_DEVELOPER_GUIDE.md`** (~1250 lines) - Located at `python/samples/02-agents/security/FIDES_DEVELOPER_GUIDE.md` - Complete documentation of the FIDES security system - Architecture overview and design rationale - Usage examples (6+ comprehensive scenarios) - Best practices and configuration options - API reference with full parameter documentation - Data exfiltration prevention documentation 3. **`python/packages/core/tests/test_security.py`** (~800+ lines) - Unit tests for ContentLabel and label operations - Tests for ContentVariableStore functionality - Tests for VariableReferenceContent - Middleware behavior tests (label tracking and policy enforcement) - Automatic hiding tests - Per-item embedded label tests - Context label tracking tests - Message-level tracking tests (Phase 1) - Data exfiltration prevention tests 4. **`docs/decisions/0024-prompt-injection-defense.md`** - Architecture Decision Record (ADR) - Design rationale and alternatives considered - Security properties and guarantees 5. **`python/samples/02-agents/security/README.md`** - Sample-focused entry point for the two runnable FIDES security samples - Prerequisites, run commands, and links to the developer guide for deeper details ### Files Modified 1. **`python/packages/core/agent_framework/__init__.py`** - Removed root-level security exports so `agent_framework.security` is the canonical import surface ## Core Features ### 1. Content Labeling Infrastructure - **IntegrityLabel**: TRUSTED (user input) vs UNTRUSTED (AI-generated, external) - **ConfidentialityLabel**: PUBLIC, PRIVATE, USER_IDENTITY - **Label Combination**: Most restrictive policy (UNTRUSTED + metadata merging) - **Serialization**: Full support for `to_dict()` and `from_dict()` ### 2. Per-Item Embedded Labels Tools returning mixed-trust data embed labels on individual items using `Content.from_text()`: ```python import json from agent_framework import Content, tool @tool(description="Fetch emails from inbox") async def fetch_emails(count: int = 5) -> list[Content]: return [ Content.from_text( json.dumps({ "id": email["id"], "body": email["body"], }), additional_properties={ "security_label": { "integrity": "trusted" if email["internal"] else "untrusted", "confidentiality": "private", } ), ) for email in emails ] ``` These embedded labels are automatically consumed by `LabelTrackingFunctionMiddleware`, which: - Extracts the `security_label` from `additional_properties` - Uses the embedded label as the highest-priority source for that item - Automatically hides UNTRUSTED items in the variable store - Replaces hidden items with `VariableReferenceContent` in the LLM context - Preserves TRUSTED items visible to the LLM without tainting the context label This enables tools to return mixed-trust data where some items (internal emails) remain visible while untrusted items (external emails) are automatically hidden without manual intervention. }, ) for email in emails ] ``` ### 3. Automatic Variable Hiding This feature automatically hides any UNTRUSTED content returned by tools while keeping the hiding logic transparent to the developer. Developers do not need to manually call `store_untrusted_content()`. This allows the LLM /agent's context to remain clean and secure. Key aspects include: - **Automatic Detection**: Middleware checks integrity label after each tool call - **Automatic Storage**: UNTRUSTED results/items stored in variable store - **Transparent Replacement**: LLM context receives `VariableReferenceContent` - **Context Label Protection**: Hidden content does NOT taint context label ### 4. Context Label Tracking - Context label starts as TRUSTED + PUBLIC - Gets updated (tainted) when non-hidden untrusted content enters context - Policy enforcement uses context label for validation - Provides `get_context_label()` and `reset_context_label()` methods ### 5. Data Exfiltration Prevention Tools declare `max_allowed_confidentiality` to prevent sensitive data leakage: ```python @tool( description="Post to public Slack channel", additional_properties={ "max_allowed_confidentiality": "public", # Blocks PRIVATE data } ) async def post_to_slack(channel: str, message: str) -> dict: return {"status": "posted"} ``` ### 6. SecureAgentConfig (Context Provider) SecureAgentConfig extends `ContextProvider` for automatic secure agent configuration: ```python config = SecureAgentConfig( auto_hide_untrusted=True, allow_untrusted_tools={"search_web", "fetch_data"}, block_on_violation=True, quarantine_chat_client=quarantine_client, # Optional: real LLM for quarantine ) # Context provider injects tools, instructions, and middleware automatically agent = Agent( client=client, name="secure_assistant", instructions="You are a helpful assistant.", tools=[my_tool], context_providers=[config], # That's it! ) ``` ## Security Properties ### Deterministic Defense 1. **Tiered label propagation**: Every tool result receives a label via 3-tier priority (embedded > source_integrity > input labels join) 2. **Context tracking**: Cumulative security state tracked across turns 3. **Policy enforcement**: Violations blocked before execution 4. **Content isolation**: Untrusted content stored as variables 5. **Taint propagation**: Once context becomes UNTRUSTED, it stays UNTRUSTED 6. **Data exfiltration prevention**: `max_allowed_confidentiality` gates output destinations 7. **Audit trail**: All security events logged 8. **No runtime guessing**: Deterministic label assignment ### Attack Prevention - **Direct prompt injection**: Variables hide actual content from LLM - **Indirect prompt injection**: Labels track untrusted AI-generated calls - **Privilege escalation**: Policy blocks untrusted calls to privileged tools - **Data exfiltration**: Confidentiality labels + `max_allowed_confidentiality` enforced - **Tool misuse**: Only whitelisted tools accept untrusted inputs ## Configuration Options ### LabelTrackingFunctionMiddleware - `default_integrity`: Default label for unknown sources - `default_confidentiality`: Default confidentiality level - `auto_hide_untrusted`: Enable automatic variable hiding (default: True) - `hide_threshold`: Integrity level at which hiding occurs (default: UNTRUSTED) ### PolicyEnforcementFunctionMiddleware - `allow_untrusted_tools`: Set of tools accepting untrusted inputs - `block_on_violation`: Block vs warn on violations - `enable_audit_log`: Enable/disable audit logging ### Tool Metadata (via `additional_properties`) - `confidentiality`: Tool's output confidentiality level - `source_integrity`: Fallback integrity for unlabeled results (data-producing tools only) - `accepts_untrusted`: Explicit untrusted input permission - `max_allowed_confidentiality`: Maximum allowed input confidentiality (for sink tools) - `requires_approval`: Human-in-the-loop requirement ## Usage Pattern ### Recommended: SecureAgentConfig as Context Provider ```python from agent_framework.security import SecureAgentConfig config = SecureAgentConfig( auto_hide_untrusted=True, allow_untrusted_tools={"search_web"}, block_on_violation=True, ) # Context provider injects everything automatically agent = Agent( client=client, name="secure_assistant", instructions="You are a helpful assistant.", tools=[search_web], context_providers=[config], # Tools, instructions, and middleware injected via before_run() ) ``` ### Processing Hidden Content with quarantined_llm ```python from agent_framework.security import quarantined_llm # Agent automatically uses quarantined_llm with variable_ids result = await quarantined_llm( prompt="Summarize this data", variable_ids=["var_abc123"] # Reference hidden content by ID ) ``` ## Testing Comprehensive test suite with: - 115+ unit tests covering all components - Label creation, serialization, combination - Variable store operations - Middleware behavior (tracking and enforcement) - Automatic hiding with per-item labels - Context label tracking - Message-level tracking (Phase 1) - Data exfiltration prevention - Policy violation scenarios - Audit log verification Run tests: ```bash cd python/packages/core && ../../.venv/bin/pytest tests/test_security.py -v ``` ## Code Statistics - **Total lines**: ~2,950+ lines (single `security.py` module) - **New modules**: 1 (`security.py` — consolidated from 3 original modules) - **Total tests**: 115+ unit tests - **Documentation**: 1,250+ lines in developer guide - **Examples**: 6+ comprehensive scenarios ## Deliverables Checklist ### Core Implementation ✅ ContentLabel infrastructure with integrity and confidentiality ✅ ContentVariableStore for variable indirection ✅ VariableReferenceContent for safe context references ✅ LabelTrackingFunctionMiddleware for automatic labeling ✅ PolicyEnforcementFunctionMiddleware for policy enforcement ✅ quarantined_llm tool for isolated processing ✅ inspect_variable tool for controlled content access ✅ store_untrusted_content helper for manual variable indirection ### Automatic Hiding Enhancement ✅ Auto-hide UNTRUSTED content with `auto_hide_untrusted` flag ✅ Per-middleware ContentVariableStore instances ✅ Thread-local storage for middleware access from tools ✅ Automatic UNTRUSTED content replacement ### Per-Item Embedded Labels ✅ Support for `additional_properties.security_label` on individual items ✅ Mixed-trust data handling (hide untrusted, keep trusted visible) ✅ Fallback to `source_integrity` for unlabeled items ### Context Label Tracking ✅ Cumulative context label tracking across turns ✅ Hidden content does NOT taint context ✅ `get_context_label()` and `reset_context_label()` methods ✅ Policy enforcement uses context label ### Data Exfiltration Prevention ✅ `max_allowed_confidentiality` tool property ✅ `check_confidentiality_allowed()` helper function ✅ Policy enforcement validates confidentiality flow ### SecureAgentConfig ✅ Context provider pattern with `ContextProvider` base class ✅ `before_run()` hook for automatic injection of tools, instructions, and middleware ✅ One-line secure agent configuration via `context_providers=[config]` ✅ `get_tools()`, `get_instructions()`, `get_middleware()` methods (for manual use) ✅ `quarantine_chat_client` support for real LLM calls ✅ `SECURITY_TOOL_INSTRUCTIONS` constant ### Documentation & Testing ✅ Complete FIDES Developer Guide (~1250 lines) ✅ Architecture Decision Record (ADR) ✅ Quick Start Guide ✅ Comprehensive test suite (115+ tests) ✅ Example code with 6+ scenarios ✅ 3 complete security examples (email, repo confidentiality, GitHub MCP labels) ## Summary **FIDES** provides a comprehensive, deterministic defense against prompt injection attacks with: - **Zero-effort protection**: Automatic variable hiding for developers - **Context provider pattern**: `SecureAgentConfig` extends `ContextProvider` for automatic setup - **Granular control**: Per-item embedded labels via `Content.from_text()` for mixed-trust data - **Easy configuration**: `SecureAgentConfig` for one-line setup - **Data safety**: Exfiltration prevention via confidentiality gates - **Full traceability**: Message-level label tracking - **Complete auditability**: All security events logged The system ensures that untrusted content never directly reaches the LLM context and that all tool calls are policy-checked based on the cumulative security state before execution.