mirror of https://github.com/microsoft/agent-framework.git synced 2026-06-16 21:04:09 +08:00

Files

T

Eduard van Valkenburg ddfbdf5c7a Python: information-flow control prompt injection defense (#5331 )

* Python: Information-flow control based prompt injection defense (#5024)

* fides integration

* documentation

* documentation

* documentation

* human-approval on policy violation

* numenous hyena 'works'

* IFC based implementation

* minor edits in documentation

* rebasing the branch and running the email example

* Add security tests for IFC middleware

* Fix Role.TOOL NameError in approval handling

* tiered labelling scheme

* 3 tier labelling scheme in middleware

* Adapt security middleware to list[Content] tool results

* Refactor SecureAgentConfig as context provider and address Copilot review comments

* Update FIDES docs to reflect context provider pattern and update code for ContextProvider rename

* Fix security examples: use OpenAIChatClient instead of non-existent AzureOpenAIChatClient

* Address PR review: consolidate security modules, remove ContentLineage, update docs

* remove unrelated files

* remove comment from _tools.py and rename decision file

* Fix CI failures: Bandit B110, broken md links, hosted approval passthrough

* apply template to decision doc 0024

* minor fixes to decision doc 0024

---------

Co-authored-by: Aashish <t-akolluri@microsoft.com>

* Python: follow up FIDES security flow (#5330)

* Python: follow up FIDES security flow

Refine the secure approval path, mark the security classes with the FIDES experimental feature label, and clean up the related docs/tests. Also fix workspace-level validation regressions uncovered while running the full Python check suite.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: remove FIDES GitHub MCP sample

Drop the GitHub MCP security sample from the FIDES follow-up branch while keeping the remaining security docs and samples intact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: fix paths and update FIDES implementation (#5352)

* Python: updated import naming and comment from review (#5421)

* updated import naming and comment from review

* Add approval replay None call-id test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Address PR 5331 comments and track sesssion while calling Agent in email_security_example (#5446)

* Address PR review: fix paths and update FIDES implementation

* Address PR comments and add session tracking in email example in samples

* Fix session creation and resolve merge conflict in docstring example

* Resolve merge conflict in docstring example

* Python: add test for empty-message pruning in approval result replacement (#5617)

Adds test coverage for the second-pass logic in
`_replace_approval_contents_with_results` that removes messages whose
`contents` list becomes empty after first-pass content removal.

Addresses review comment on PR #5331:
https://github.com/microsoft/agent-framework/pull/5331#discussion_r3129039445

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: shrutitople <shruti.tople@gmail.com>
Co-authored-by: Aashish <t-akolluri@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-05 18:08:08 +00:00

14 KiB

Raw Permalink Blame History

FIDES Implementation Summary

Overview

FIDES is a comprehensive deterministic prompt injection defense system for the agent framework. The implementation provides label-based security mechanisms to defend against prompt injection attacks by tracking integrity and confidentiality of content throughout agent execution.

🚀 Key Features:

Context Provider Pattern - SecureAgentConfig extends ContextProvider, injecting tools, instructions, and middleware automatically
Automatic Variable Hiding - UNTRUSTED content is automatically hidden without requiring manual intervention
Per-Item Embedded Labels - Tools return list[Content] with Content.from_text() for proper label propagation
SecureAgentConfig - One-line secure agent configuration via context_providers=[config]
Data Exfiltration Prevention - max_allowed_confidentiality prevents sensitive data leakage
Message-Level Label Tracking (Phase 1) - Track labels on every message in the conversation

Architecture Components

The FIDES defense system consists of seven main components:

Content Labeling Infrastructure - Labels for tracking integrity and confidentiality
Label Tracking Middleware - Automatically assigns, propagates labels, and hides untrusted content
Per-Item Embedded Labels - Tools can return mixed-trust data with per-item security labels
Policy Enforcement Middleware - Blocks tool calls that violate security policies
Security Tools - Specialized tools for safe handling of untrusted content (quarantined_llm, inspect_variable)
SecureAgentConfig - Context provider for easy secure agent configuration
Message-Level Label Tracking - Track labels on every message in the conversation (Phase 1)

Implementation Details

Files Created

python/packages/core/agent_framework/security.py (~2950 lines — all security primitives, middleware, tools, and configuration in a single public module)
- IntegrityLabel enum (TRUSTED/UNTRUSTED)
- ConfidentialityLabel enum (PUBLIC/PRIVATE/USER_IDENTITY)
- ContentLabel class with serialization support
- combine_labels() function for label composition
- ContentVariableStore for client-side content storage
- VariableReferenceContent for variable indirection
- LabeledMessage class (inherits from Message) for message-level tracking
- check_confidentiality_allowed() helper for data exfiltration prevention
- LabelTrackingFunctionMiddleware - Tracks and propagates security labels
- PolicyEnforcementFunctionMiddleware - Enforces security policies
- SecureAgentConfig extends ContextProvider - automatic secure agent configuration
- quarantined_llm() - Isolated LLM calls with labeled data
- inspect_variable() - Controlled variable content inspection
- store_untrusted_content() - Helper for manual variable indirection (legacy)
- get_security_tools() - Returns list of security tools
- SECURITY_TOOL_INSTRUCTIONS - Detailed guidance for agents
FIDES_DEVELOPER_GUIDE.md (~1250 lines)
- Located at python/samples/02-agents/security/FIDES_DEVELOPER_GUIDE.md
- Complete documentation of the FIDES security system
- Architecture overview and design rationale
- Usage examples (6+ comprehensive scenarios)
- Best practices and configuration options
- API reference with full parameter documentation
- Data exfiltration prevention documentation
python/packages/core/tests/test_security.py (~800+ lines)
- Unit tests for ContentLabel and label operations
- Tests for ContentVariableStore functionality
- Tests for VariableReferenceContent
- Middleware behavior tests (label tracking and policy enforcement)
- Automatic hiding tests
- Per-item embedded label tests
- Context label tracking tests
- Message-level tracking tests (Phase 1)
- Data exfiltration prevention tests
docs/decisions/0024-prompt-injection-defense.md
- Architecture Decision Record (ADR)
- Design rationale and alternatives considered
- Security properties and guarantees
python/samples/02-agents/security/README.md
- Sample-focused entry point for the two runnable FIDES security samples
- Prerequisites, run commands, and links to the developer guide for deeper details

Files Modified

python/packages/core/agent_framework/__init__.py
- Removed root-level security exports so agent_framework.security is the canonical import surface

Core Features

1. Content Labeling Infrastructure

IntegrityLabel: TRUSTED (user input) vs UNTRUSTED (AI-generated, external)
ConfidentialityLabel: PUBLIC, PRIVATE, USER_IDENTITY
Label Combination: Most restrictive policy (UNTRUSTED + metadata merging)
Serialization: Full support for to_dict() and from_dict()

2. Per-Item Embedded Labels

Tools returning mixed-trust data embed labels on individual items using Content.from_text():

import json
from agent_framework import Content, tool

@tool(description="Fetch emails from inbox")
async def fetch_emails(count: int = 5) -> list[Content]:
    return [
        Content.from_text(
            json.dumps({
                "id": email["id"],
                "body": email["body"],
            }),
            additional_properties={
                "security_label": {
                    "integrity": "trusted" if email["internal"] else "untrusted",
                    "confidentiality": "private",
                }
            ),
        )
        for email in emails
    ]

These embedded labels are automatically consumed by LabelTrackingFunctionMiddleware, which:

Extracts the security_label from additional_properties
Uses the embedded label as the highest-priority source for that item
Automatically hides UNTRUSTED items in the variable store
Replaces hidden items with VariableReferenceContent in the LLM context
Preserves TRUSTED items visible to the LLM without tainting the context label

This enables tools to return mixed-trust data where some items (internal emails) remain visible while untrusted items (external emails) are automatically hidden without manual intervention. }, ) for email in emails ]


### 3. Automatic Variable Hiding

This feature automatically hides any UNTRUSTED content returned by tools while keeping the hiding logic transparent to the developer. Developers do not need to manually call `store_untrusted_content()`. This allows the LLM /agent's context to remain clean and secure. Key aspects include:

- **Automatic Detection**: Middleware checks integrity label after each tool call
- **Automatic Storage**: UNTRUSTED results/items stored in variable store
- **Transparent Replacement**: LLM context receives `VariableReferenceContent`
- **Context Label Protection**: Hidden content does NOT taint context label

### 4. Context Label Tracking

- Context label starts as TRUSTED + PUBLIC
- Gets updated (tainted) when non-hidden untrusted content enters context
- Policy enforcement uses context label for validation
- Provides `get_context_label()` and `reset_context_label()` methods

### 5. Data Exfiltration Prevention

Tools declare `max_allowed_confidentiality` to prevent sensitive data leakage:

```python
@tool(
    description="Post to public Slack channel",
    additional_properties={
        "max_allowed_confidentiality": "public",  # Blocks PRIVATE data
    }
)
async def post_to_slack(channel: str, message: str) -> dict:
    return {"status": "posted"}

6. SecureAgentConfig (Context Provider)

SecureAgentConfig extends ContextProvider for automatic secure agent configuration:

config = SecureAgentConfig(
    auto_hide_untrusted=True,
    allow_untrusted_tools={"search_web", "fetch_data"},
    block_on_violation=True,
    quarantine_chat_client=quarantine_client,  # Optional: real LLM for quarantine
)

# Context provider injects tools, instructions, and middleware automatically
agent = Agent(
    client=client,
    name="secure_assistant",
    instructions="You are a helpful assistant.",
    tools=[my_tool],
    context_providers=[config],  # That's it!
)

Security Properties

Deterministic Defense

Tiered label propagation: Every tool result receives a label via 3-tier priority (embedded > source_integrity > input labels join)
Context tracking: Cumulative security state tracked across turns
Policy enforcement: Violations blocked before execution
Content isolation: Untrusted content stored as variables
Taint propagation: Once context becomes UNTRUSTED, it stays UNTRUSTED
Data exfiltration prevention: max_allowed_confidentiality gates output destinations
Audit trail: All security events logged
No runtime guessing: Deterministic label assignment

Attack Prevention

Direct prompt injection: Variables hide actual content from LLM
Indirect prompt injection: Labels track untrusted AI-generated calls
Privilege escalation: Policy blocks untrusted calls to privileged tools
Data exfiltration: Confidentiality labels + max_allowed_confidentiality enforced
Tool misuse: Only whitelisted tools accept untrusted inputs

Configuration Options

LabelTrackingFunctionMiddleware

default_integrity: Default label for unknown sources
default_confidentiality: Default confidentiality level
auto_hide_untrusted: Enable automatic variable hiding (default: True)
hide_threshold: Integrity level at which hiding occurs (default: UNTRUSTED)

PolicyEnforcementFunctionMiddleware

allow_untrusted_tools: Set of tools accepting untrusted inputs
block_on_violation: Block vs warn on violations
enable_audit_log: Enable/disable audit logging

Tool Metadata (via `additional_properties`)

confidentiality: Tool's output confidentiality level
source_integrity: Fallback integrity for unlabeled results (data-producing tools only)
accepts_untrusted: Explicit untrusted input permission
max_allowed_confidentiality: Maximum allowed input confidentiality (for sink tools)
requires_approval: Human-in-the-loop requirement

Usage Pattern

Recommended: SecureAgentConfig as Context Provider

from agent_framework.security import SecureAgentConfig

config = SecureAgentConfig(
    auto_hide_untrusted=True,
    allow_untrusted_tools={"search_web"},
    block_on_violation=True,
)

# Context provider injects everything automatically
agent = Agent(
    client=client,
    name="secure_assistant",
    instructions="You are a helpful assistant.",
    tools=[search_web],
    context_providers=[config],  # Tools, instructions, and middleware injected via before_run()
)

Processing Hidden Content with quarantined_llm

from agent_framework.security import quarantined_llm

# Agent automatically uses quarantined_llm with variable_ids
result = await quarantined_llm(
    prompt="Summarize this data",
    variable_ids=["var_abc123"]  # Reference hidden content by ID
)

Testing

Comprehensive test suite with:

115+ unit tests covering all components
Label creation, serialization, combination
Variable store operations
Middleware behavior (tracking and enforcement)
Automatic hiding with per-item labels
Context label tracking
Message-level tracking (Phase 1)
Data exfiltration prevention
Policy violation scenarios
Audit log verification

Run tests:

cd python/packages/core && ../../.venv/bin/pytest tests/test_security.py -v

Code Statistics

Total lines: ~2,950+ lines (single security.py module)
New modules: 1 (security.py — consolidated from 3 original modules)
Total tests: 115+ unit tests
Documentation: 1,250+ lines in developer guide
Examples: 6+ comprehensive scenarios

Deliverables Checklist

Core Implementation

✅ ContentLabel infrastructure with integrity and confidentiality ✅ ContentVariableStore for variable indirection ✅ VariableReferenceContent for safe context references ✅ LabelTrackingFunctionMiddleware for automatic labeling ✅ PolicyEnforcementFunctionMiddleware for policy enforcement ✅ quarantined_llm tool for isolated processing ✅ inspect_variable tool for controlled content access ✅ store_untrusted_content helper for manual variable indirection

Automatic Hiding Enhancement

✅ Auto-hide UNTRUSTED content with auto_hide_untrusted flag ✅ Per-middleware ContentVariableStore instances ✅ Thread-local storage for middleware access from tools ✅ Automatic UNTRUSTED content replacement

Per-Item Embedded Labels

✅ Support for additional_properties.security_label on individual items ✅ Mixed-trust data handling (hide untrusted, keep trusted visible) ✅ Fallback to source_integrity for unlabeled items

Context Label Tracking

✅ Cumulative context label tracking across turns ✅ Hidden content does NOT taint context ✅ get_context_label() and reset_context_label() methods ✅ Policy enforcement uses context label

Data Exfiltration Prevention

✅ max_allowed_confidentiality tool property ✅ check_confidentiality_allowed() helper function ✅ Policy enforcement validates confidentiality flow

SecureAgentConfig

✅ Context provider pattern with ContextProvider base class ✅ before_run() hook for automatic injection of tools, instructions, and middleware ✅ One-line secure agent configuration via context_providers=[config] ✅ get_tools(), get_instructions(), get_middleware() methods (for manual use) ✅ quarantine_chat_client support for real LLM calls ✅ SECURITY_TOOL_INSTRUCTIONS constant

Documentation & Testing

✅ Complete FIDES Developer Guide (~1250 lines) ✅ Architecture Decision Record (ADR) ✅ Quick Start Guide ✅ Comprehensive test suite (115+ tests) ✅ Example code with 6+ scenarios ✅ 3 complete security examples (email, repo confidentiality, GitHub MCP labels)

Summary

FIDES provides a comprehensive, deterministic defense against prompt injection attacks with:

Zero-effort protection: Automatic variable hiding for developers
Context provider pattern: SecureAgentConfig extends ContextProvider for automatic setup
Granular control: Per-item embedded labels via Content.from_text() for mixed-trust data
Easy configuration: SecureAgentConfig for one-line setup
Data safety: Exfiltration prevention via confidentiality gates
Full traceability: Message-level label tracking
Complete auditability: All security events logged

The system ensures that untrusted content never directly reaches the LLM context and that all tool calls are policy-checked based on the cumulative security state before execution.

14 KiB Raw Permalink Blame History