* Python: Information-flow control based prompt injection defense (#5024) * fides integration * documentation * documentation * documentation * human-approval on policy violation * numenous hyena 'works' * IFC based implementation * minor edits in documentation * rebasing the branch and running the email example * Add security tests for IFC middleware * Fix Role.TOOL NameError in approval handling * tiered labelling scheme * 3 tier labelling scheme in middleware * Adapt security middleware to list[Content] tool results * Refactor SecureAgentConfig as context provider and address Copilot review comments * Update FIDES docs to reflect context provider pattern and update code for ContextProvider rename * Fix security examples: use OpenAIChatClient instead of non-existent AzureOpenAIChatClient * Address PR review: consolidate security modules, remove ContentLineage, update docs * remove unrelated files * remove comment from _tools.py and rename decision file * Fix CI failures: Bandit B110, broken md links, hosted approval passthrough * apply template to decision doc 0024 * minor fixes to decision doc 0024 --------- Co-authored-by: Aashish <t-akolluri@microsoft.com> * Python: follow up FIDES security flow (#5330) * Python: follow up FIDES security flow Refine the secure approval path, mark the security classes with the FIDES experimental feature label, and clean up the related docs/tests. Also fix workspace-level validation regressions uncovered while running the full Python check suite. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: remove FIDES GitHub MCP sample Drop the GitHub MCP security sample from the FIDES follow-up branch while keeping the remaining security docs and samples intact. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review: fix paths and update FIDES implementation (#5352) * Python: updated import naming and comment from review (#5421) * updated import naming and comment from review * Add approval replay None call-id test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Address PR 5331 comments and track sesssion while calling Agent in email_security_example (#5446) * Address PR review: fix paths and update FIDES implementation * Address PR comments and add session tracking in email example in samples * Fix session creation and resolve merge conflict in docstring example * Resolve merge conflict in docstring example * Python: add test for empty-message pruning in approval result replacement (#5617) Adds test coverage for the second-pass logic in `_replace_approval_contents_with_results` that removes messages whose `contents` list becomes empty after first-pass content removal. Addresses review comment on PR #5331: https://github.com/microsoft/agent-framework/pull/5331#discussion_r3129039445 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: shrutitople <shruti.tople@gmail.com> Co-authored-by: Aashish <t-akolluri@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
14 KiB
FIDES Implementation Summary
Overview
FIDES is a comprehensive deterministic prompt injection defense system for the agent framework. The implementation provides label-based security mechanisms to defend against prompt injection attacks by tracking integrity and confidentiality of content throughout agent execution.
🚀 Key Features:
- Context Provider Pattern -
SecureAgentConfigextendsContextProvider, injecting tools, instructions, and middleware automatically - Automatic Variable Hiding - UNTRUSTED content is automatically hidden without requiring manual intervention
- Per-Item Embedded Labels - Tools return
list[Content]withContent.from_text()for proper label propagation - SecureAgentConfig - One-line secure agent configuration via
context_providers=[config] - Data Exfiltration Prevention -
max_allowed_confidentialityprevents sensitive data leakage - Message-Level Label Tracking (Phase 1) - Track labels on every message in the conversation
Architecture Components
The FIDES defense system consists of seven main components:
- Content Labeling Infrastructure - Labels for tracking integrity and confidentiality
- Label Tracking Middleware - Automatically assigns, propagates labels, and hides untrusted content
- Per-Item Embedded Labels - Tools can return mixed-trust data with per-item security labels
- Policy Enforcement Middleware - Blocks tool calls that violate security policies
- Security Tools - Specialized tools for safe handling of untrusted content (
quarantined_llm,inspect_variable) - SecureAgentConfig - Context provider for easy secure agent configuration
- Message-Level Label Tracking - Track labels on every message in the conversation (Phase 1)
Implementation Details
Files Created
-
python/packages/core/agent_framework/security.py(~2950 lines — all security primitives, middleware, tools, and configuration in a single public module)IntegrityLabelenum (TRUSTED/UNTRUSTED)ConfidentialityLabelenum (PUBLIC/PRIVATE/USER_IDENTITY)ContentLabelclass with serialization supportcombine_labels()function for label compositionContentVariableStorefor client-side content storageVariableReferenceContentfor variable indirectionLabeledMessageclass (inherits fromMessage) for message-level trackingcheck_confidentiality_allowed()helper for data exfiltration preventionLabelTrackingFunctionMiddleware- Tracks and propagates security labelsPolicyEnforcementFunctionMiddleware- Enforces security policiesSecureAgentConfigextendsContextProvider- automatic secure agent configurationquarantined_llm()- Isolated LLM calls with labeled datainspect_variable()- Controlled variable content inspectionstore_untrusted_content()- Helper for manual variable indirection (legacy)get_security_tools()- Returns list of security toolsSECURITY_TOOL_INSTRUCTIONS- Detailed guidance for agents
-
FIDES_DEVELOPER_GUIDE.md(~1250 lines)- Located at
python/samples/02-agents/security/FIDES_DEVELOPER_GUIDE.md - Complete documentation of the FIDES security system
- Architecture overview and design rationale
- Usage examples (6+ comprehensive scenarios)
- Best practices and configuration options
- API reference with full parameter documentation
- Data exfiltration prevention documentation
- Located at
-
python/packages/core/tests/test_security.py(~800+ lines)- Unit tests for ContentLabel and label operations
- Tests for ContentVariableStore functionality
- Tests for VariableReferenceContent
- Middleware behavior tests (label tracking and policy enforcement)
- Automatic hiding tests
- Per-item embedded label tests
- Context label tracking tests
- Message-level tracking tests (Phase 1)
- Data exfiltration prevention tests
-
docs/decisions/0024-prompt-injection-defense.md- Architecture Decision Record (ADR)
- Design rationale and alternatives considered
- Security properties and guarantees
-
python/samples/02-agents/security/README.md- Sample-focused entry point for the two runnable FIDES security samples
- Prerequisites, run commands, and links to the developer guide for deeper details
Files Modified
python/packages/core/agent_framework/__init__.py- Removed root-level security exports so
agent_framework.securityis the canonical import surface
- Removed root-level security exports so
Core Features
1. Content Labeling Infrastructure
- IntegrityLabel: TRUSTED (user input) vs UNTRUSTED (AI-generated, external)
- ConfidentialityLabel: PUBLIC, PRIVATE, USER_IDENTITY
- Label Combination: Most restrictive policy (UNTRUSTED + metadata merging)
- Serialization: Full support for
to_dict()andfrom_dict()
2. Per-Item Embedded Labels
Tools returning mixed-trust data embed labels on individual items using Content.from_text():
import json
from agent_framework import Content, tool
@tool(description="Fetch emails from inbox")
async def fetch_emails(count: int = 5) -> list[Content]:
return [
Content.from_text(
json.dumps({
"id": email["id"],
"body": email["body"],
}),
additional_properties={
"security_label": {
"integrity": "trusted" if email["internal"] else "untrusted",
"confidentiality": "private",
}
),
)
for email in emails
]
These embedded labels are automatically consumed by LabelTrackingFunctionMiddleware, which:
- Extracts the
security_labelfromadditional_properties - Uses the embedded label as the highest-priority source for that item
- Automatically hides UNTRUSTED items in the variable store
- Replaces hidden items with
VariableReferenceContentin the LLM context - Preserves TRUSTED items visible to the LLM without tainting the context label
This enables tools to return mixed-trust data where some items (internal emails) remain visible while untrusted items (external emails) are automatically hidden without manual intervention. }, ) for email in emails ]
### 3. Automatic Variable Hiding
This feature automatically hides any UNTRUSTED content returned by tools while keeping the hiding logic transparent to the developer. Developers do not need to manually call `store_untrusted_content()`. This allows the LLM /agent's context to remain clean and secure. Key aspects include:
- **Automatic Detection**: Middleware checks integrity label after each tool call
- **Automatic Storage**: UNTRUSTED results/items stored in variable store
- **Transparent Replacement**: LLM context receives `VariableReferenceContent`
- **Context Label Protection**: Hidden content does NOT taint context label
### 4. Context Label Tracking
- Context label starts as TRUSTED + PUBLIC
- Gets updated (tainted) when non-hidden untrusted content enters context
- Policy enforcement uses context label for validation
- Provides `get_context_label()` and `reset_context_label()` methods
### 5. Data Exfiltration Prevention
Tools declare `max_allowed_confidentiality` to prevent sensitive data leakage:
```python
@tool(
description="Post to public Slack channel",
additional_properties={
"max_allowed_confidentiality": "public", # Blocks PRIVATE data
}
)
async def post_to_slack(channel: str, message: str) -> dict:
return {"status": "posted"}
6. SecureAgentConfig (Context Provider)
SecureAgentConfig extends ContextProvider for automatic secure agent configuration:
config = SecureAgentConfig(
auto_hide_untrusted=True,
allow_untrusted_tools={"search_web", "fetch_data"},
block_on_violation=True,
quarantine_chat_client=quarantine_client, # Optional: real LLM for quarantine
)
# Context provider injects tools, instructions, and middleware automatically
agent = Agent(
client=client,
name="secure_assistant",
instructions="You are a helpful assistant.",
tools=[my_tool],
context_providers=[config], # That's it!
)
Security Properties
Deterministic Defense
- Tiered label propagation: Every tool result receives a label via 3-tier priority (embedded > source_integrity > input labels join)
- Context tracking: Cumulative security state tracked across turns
- Policy enforcement: Violations blocked before execution
- Content isolation: Untrusted content stored as variables
- Taint propagation: Once context becomes UNTRUSTED, it stays UNTRUSTED
- Data exfiltration prevention:
max_allowed_confidentialitygates output destinations - Audit trail: All security events logged
- No runtime guessing: Deterministic label assignment
Attack Prevention
- Direct prompt injection: Variables hide actual content from LLM
- Indirect prompt injection: Labels track untrusted AI-generated calls
- Privilege escalation: Policy blocks untrusted calls to privileged tools
- Data exfiltration: Confidentiality labels +
max_allowed_confidentialityenforced - Tool misuse: Only whitelisted tools accept untrusted inputs
Configuration Options
LabelTrackingFunctionMiddleware
default_integrity: Default label for unknown sourcesdefault_confidentiality: Default confidentiality levelauto_hide_untrusted: Enable automatic variable hiding (default: True)hide_threshold: Integrity level at which hiding occurs (default: UNTRUSTED)
PolicyEnforcementFunctionMiddleware
allow_untrusted_tools: Set of tools accepting untrusted inputsblock_on_violation: Block vs warn on violationsenable_audit_log: Enable/disable audit logging
Tool Metadata (via additional_properties)
confidentiality: Tool's output confidentiality levelsource_integrity: Fallback integrity for unlabeled results (data-producing tools only)accepts_untrusted: Explicit untrusted input permissionmax_allowed_confidentiality: Maximum allowed input confidentiality (for sink tools)requires_approval: Human-in-the-loop requirement
Usage Pattern
Recommended: SecureAgentConfig as Context Provider
from agent_framework.security import SecureAgentConfig
config = SecureAgentConfig(
auto_hide_untrusted=True,
allow_untrusted_tools={"search_web"},
block_on_violation=True,
)
# Context provider injects everything automatically
agent = Agent(
client=client,
name="secure_assistant",
instructions="You are a helpful assistant.",
tools=[search_web],
context_providers=[config], # Tools, instructions, and middleware injected via before_run()
)
Processing Hidden Content with quarantined_llm
from agent_framework.security import quarantined_llm
# Agent automatically uses quarantined_llm with variable_ids
result = await quarantined_llm(
prompt="Summarize this data",
variable_ids=["var_abc123"] # Reference hidden content by ID
)
Testing
Comprehensive test suite with:
- 115+ unit tests covering all components
- Label creation, serialization, combination
- Variable store operations
- Middleware behavior (tracking and enforcement)
- Automatic hiding with per-item labels
- Context label tracking
- Message-level tracking (Phase 1)
- Data exfiltration prevention
- Policy violation scenarios
- Audit log verification
Run tests:
cd python/packages/core && ../../.venv/bin/pytest tests/test_security.py -v
Code Statistics
- Total lines: ~2,950+ lines (single
security.pymodule) - New modules: 1 (
security.py— consolidated from 3 original modules) - Total tests: 115+ unit tests
- Documentation: 1,250+ lines in developer guide
- Examples: 6+ comprehensive scenarios
Deliverables Checklist
Core Implementation
✅ ContentLabel infrastructure with integrity and confidentiality ✅ ContentVariableStore for variable indirection ✅ VariableReferenceContent for safe context references ✅ LabelTrackingFunctionMiddleware for automatic labeling ✅ PolicyEnforcementFunctionMiddleware for policy enforcement ✅ quarantined_llm tool for isolated processing ✅ inspect_variable tool for controlled content access ✅ store_untrusted_content helper for manual variable indirection
Automatic Hiding Enhancement
✅ Auto-hide UNTRUSTED content with auto_hide_untrusted flag
✅ Per-middleware ContentVariableStore instances
✅ Thread-local storage for middleware access from tools
✅ Automatic UNTRUSTED content replacement
Per-Item Embedded Labels
✅ Support for additional_properties.security_label on individual items
✅ Mixed-trust data handling (hide untrusted, keep trusted visible)
✅ Fallback to source_integrity for unlabeled items
Context Label Tracking
✅ Cumulative context label tracking across turns
✅ Hidden content does NOT taint context
✅ get_context_label() and reset_context_label() methods
✅ Policy enforcement uses context label
Data Exfiltration Prevention
✅ max_allowed_confidentiality tool property
✅ check_confidentiality_allowed() helper function
✅ Policy enforcement validates confidentiality flow
SecureAgentConfig
✅ Context provider pattern with ContextProvider base class
✅ before_run() hook for automatic injection of tools, instructions, and middleware
✅ One-line secure agent configuration via context_providers=[config]
✅ get_tools(), get_instructions(), get_middleware() methods (for manual use)
✅ quarantine_chat_client support for real LLM calls
✅ SECURITY_TOOL_INSTRUCTIONS constant
Documentation & Testing
✅ Complete FIDES Developer Guide (~1250 lines) ✅ Architecture Decision Record (ADR) ✅ Quick Start Guide ✅ Comprehensive test suite (115+ tests) ✅ Example code with 6+ scenarios ✅ 3 complete security examples (email, repo confidentiality, GitHub MCP labels)
Summary
FIDES provides a comprehensive, deterministic defense against prompt injection attacks with:
- Zero-effort protection: Automatic variable hiding for developers
- Context provider pattern:
SecureAgentConfigextendsContextProviderfor automatic setup - Granular control: Per-item embedded labels via
Content.from_text()for mixed-trust data - Easy configuration:
SecureAgentConfigfor one-line setup - Data safety: Exfiltration prevention via confidentiality gates
- Full traceability: Message-level label tracking
- Complete auditability: All security events logged
The system ensures that untrusted content never directly reaches the LLM context and that all tool calls are policy-checked based on the cumulative security state before execution.