Python: Adding AgentFileStore and FileAccessProvider to support file access operations. (#6099)

* Adding AgentFileStore and FileAccessProvider to support file ased operations for agents.

* Address PR review feedback on FileAccessProvider

- Probe symlinks on the unresolved candidate path so in-root symlinks
  cannot silently pass and out-of-root symlinks surface the correct
  error message.
- Validate matching_lines elements in FileSearchResult.from_dict and
  raise a clean ValueError for non-mapping entries.
- Cap search regex pattern length (256 chars) via a new
  _compile_search_regex helper to mitigate ReDoS, and surface the cap
  in the file_access_search_files tool description.
- Skip non-UTF-8 files during filesystem search instead of aborting
  the entire directory walk.
- Replace the module-scope trailing string in the data-processing
  sample with comments to avoid Ruff B018.
- Remove the checked-in working/region_totals.md sample artifact so
  the save flow works from a clean checkout.
- Expand the Windows stdout reconfiguration comment in task_runner.py
  for clarity.
- Add tests for invalid/oversize regex, non-UTF-8 file search, and
  in-root symlink rejection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix mypy redundant-cast in FileSearchResult.from_dict

Use cast(list[object], ...) instead of cast(list[Any], ...) so the
cast represents a real type change (lists are invariant) and is no
longer flagged by mypy as redundant, while still satisfying pyright's
reportUnknownVariableType. Matches the existing pattern in _memory.py.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tighten path normalization and directory resolution in FileAccess

- _normalize_relative_path now strips surrounding whitespace up front
  so leading/trailing spaces never leak into file segments, and
  rejects trailing path separators for file paths so 'foo/' is no
  longer silently coerced to 'foo'.
- FileSystemAgentFileStore._resolve_safe_directory_path normalizes
  with is_directory=True and maps an empty normalized result to the
  root. This matches InMemoryAgentFileStore so whitespace-only
  directory inputs resolve to the root instead of raising.
- Added tests for whitespace stripping, trailing-separator rejection,
  and whitespace-only directory listing on the filesystem store.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Harden FileAccess search and atomic save in store API

- Add wall-clock timeout (10s) around regex scans so a pathological pattern (e.g. `(a+)+`) below the length cap cannot stall the event loop.
- Offload the InMemoryAgentFileStore regex scan to a worker thread, matching the filesystem store.
- Fail closed when `Path.is_symlink` raises during the safe-path probe so a permission error cannot silently bypass the symlink/reparse-point rejection.
- Add `overwrite: bool = True` to `AgentFileStore.write_file`; the in-memory store performs the check under the existing lock and the filesystem store uses `open(mode='x')` so concurrent callers cannot race past `overwrite=False`.
- `file_access_save_file` now relies on the atomic store call instead of a separate `file_exists` round-trip.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Python 3.10 timeout handling and add directory arg to list/search tools

- Catch asyncio.TimeoutError in _run_search_with_timeout. In Python 3.10
  asyncio.wait_for raises asyncio.exceptions.TimeoutError, which is
  distinct from the builtin TimeoutError (the two were unified in 3.11).
  Catching the asyncio alias works on every supported version.
- Add an optional directory parameter to file_access_list_files and
  file_access_search_files so agents can enumerate / scope searches to
  nested folders, not just the store root.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address FileAccess review feedback: case, errors, signal, TOCTOU

- InMemoryAgentFileStore now stores (display_name, content) so list_files
  and search_files return the original-case names callers wrote, matching
  the behaviour of FileSystemAgentFileStore on case-preserving filesystems
  and removing the silent in-memory vs. on-disk contract divergence.
- FileSystemAgentFileStore.read_file raises ValueError instead of letting
  UnicodeDecodeError bubble for binary / non-UTF-8 input, restoring
  symmetry with search_files (which still skips) and giving the tool
  layer a recoverable type to translate.
- Tool wrappers now catch ValueError and OSError around every operation
  and surface them as readable strings, so 'you used ..' and 'the file
  already exists' are both reported to the model the same way instead of
  the former crashing out as an unhandled exception.
- _search_files_sync logs per skipped non-UTF-8 file at WARNING and an
  aggregate INFO summary so operators can distinguish 'no matches' from
  'half the corpus was unreadable'.
- FileSystemAgentFileStore softens its docstrings to acknowledge the
  inherent probe-then-open TOCTOU window. On POSIX both read and write
  now pass O_NOFOLLOW so the kernel refuses if the leaf segment becomes
  a symlink between the probe and the open. Windows has no equivalent
  flag; the limitation is documented.
- Tests cover: case preservation on list/search, ValueError on non-UTF-8
  read at the store and tool layer, tool-layer string responses for
  path-traversal and oversized-regex inputs, search-skip log output,
  symlink rejection on delete/search/list, and symlinked intermediate
  directory rejection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address FileAccess nit comments: docstrings, enumerate, opt-in delete approval

- Expand FileSearchMatch/FileSearchResult.to_dict docstrings to explain why
  the override is needed (__slots__ defeats the mixin's __dict__ iteration)
  and why exclude/exclude_none are accepted-but-ignored (mixin signature
  compatibility for callers like to_json).
- Use enumerate(lines, start=1) in _search_file_content so the +1 below is
  no longer needed; rename loop variable to line_number for clarity.
- Add opt-in require_delete_approval: bool = False on FileAccessProvider.
  When True, file_access_delete_file is registered with approval_mode
  'always_require' so the host must approve every delete. Default False
  preserves current behaviour and matches the .NET reference, but
  deployments that want a safer-by-default posture can enable it.
- Add tests covering both delete approval modes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* FileAccess: require delete approval by default

Flip the default for FileAccessProvider(require_delete_approval=...) from
False to True so destructive deletes are gated by host approval out of the
box. Callers that want the previous autonomous behaviour (which matches the
.NET reference) can pass require_delete_approval=False.

Tests updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fixing linkinspector by installing Chrome for puppeteer first.

---------

Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Ben Thomas
2026-05-28 13:09:50 -07:00
committed by GitHub
Unverified
parent 0578f4c910
commit b000a2cf51
10 changed files with 2040 additions and 2 deletions
@@ -8,6 +8,7 @@ These samples demonstrate how to use context providers to enrich agent conversat
|---------------|-------------|
| [`simple_context_provider.py`](simple_context_provider.py) | Implement a custom context provider by extending `ContextProvider` to extract and inject structured user information across turns. |
| [`azure_ai_foundry_memory.py`](azure_ai_foundry_memory.py) | Use `FoundryMemoryProvider` to add semantic memory — automatically retrieves, searches, and stores memories via Azure AI Foundry. |
| [`file_access_data_processing/`](file_access_data_processing/) | Use `FileAccessProvider` with `FileSystemAgentFileStore` to give an agent read/write/search access to a folder of CSV data files. See its own [README](file_access_data_processing/README.md). |
| [`azure_ai_search/`](azure_ai_search/) | Retrieval Augmented Generation (RAG) with Azure AI Search in semantic and agentic modes. See its own [README](azure_ai_search/README.md). |
| [`mem0/`](mem0/) | Memory-powered context using the Mem0 integration (open-source and managed). See its own [README](mem0/README.md). |
| [`redis/`](redis/) | Redis-backed context providers for conversation memory and sessions. See its own [README](redis/README.md). |
@@ -25,4 +26,9 @@ These samples demonstrate how to use context providers to enrich agent conversat
- `AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME`: Embedding model deployment name (e.g., `text-embedding-ada-002`)
- Azure CLI authentication (`az login`)
**For `file_access_data_processing/`:**
- `FOUNDRY_PROJECT_ENDPOINT`: Your Azure AI Foundry project endpoint
- `FOUNDRY_MODEL`: Chat model deployment name
- Azure CLI authentication (`az login`)
See each subfolder's README for provider-specific prerequisites.
@@ -0,0 +1,62 @@
# File Access Data Processing
This sample demonstrates how to give an `Agent` access to a folder of data files
by attaching `FileAccessProvider` (backed by `FileSystemAgentFileStore`) as a
context provider.
The agent is given a `working/` folder containing `sales.csv` — ~50 rows of
sales transaction data — and is driven through a short scripted conversation
that exercises every tool the provider exposes:
| Step | Prompt | Tool(s) used |
|---|---|---|
| 1 | "What files do you have access to?" | `file_access_list_files` |
| 2 | "Read sales.csv and summarize…" | `file_access_read_file` |
| 3 | "Calculate the total revenue per region…" | (uses previously read data) |
| 4 | "Save a markdown report named `region_totals.md`…" | `file_access_save_file` |
| 5 | "List the files again so I can confirm…" | `file_access_list_files` |
After the run, the sample prints the final contents of `working/` so the
written file is easy to spot.
## Prerequisites
| Variable | Description |
|---|---|
| `FOUNDRY_PROJECT_ENDPOINT` | Your Azure AI Foundry project endpoint. |
| `FOUNDRY_MODEL` | Chat model deployment name (e.g. `gpt-4o`). |
Run `az login` before executing the sample so `AzureCliCredential` can
authenticate.
## Running the sample
From `python/`:
```bash
uv run --package agent-framework-core python samples/02-agents/context_providers/file_access_data_processing/data_processing.py
```
Or directly:
```bash
python samples/02-agents/context_providers/file_access_data_processing/data_processing.py
```
## Sample data
`working/sales.csv` contains JanuaryMarch 2025 sales transactions with these
columns:
| Column | Description |
|---|---|
| `date` | Transaction date (YYYY-MM-DD) |
| `product` | Product name |
| `category` | Product category (Electronics, Furniture, Stationery) |
| `quantity` | Units sold |
| `unit_price` | Price per unit |
| `region` | Sales region (North, South, West) |
| `salesperson` | Name of the salesperson |
The sample writes `region_totals.md` into the same folder. Delete it between
runs if you want a clean state.
@@ -0,0 +1,145 @@
# Copyright (c) Microsoft. All rights reserved.
"""Sample: use ``FileAccessProvider`` to give an agent access to a folder of CSV data files.
This sample demonstrates how to attach :class:`FileAccessProvider` (backed by
:class:`FileSystemAgentFileStore`) to an ``Agent`` so the model can read input
data, perform analysis, and write summary output back to the same folder via
the ``file_access_*`` tools.
The sibling ``working/`` folder contains ``sales.csv`` — ~50 rows of sales
transactions (date, product, category, quantity, unit_price, region,
salesperson). The agent is asked, in a single session, to: list available
files, inspect the data, compute regional totals, and save a markdown summary.
Prerequisites:
- ``FOUNDRY_PROJECT_ENDPOINT``: Your Azure AI Foundry project endpoint.
- ``FOUNDRY_MODEL``: Chat model deployment name.
- Run ``az login`` before executing the sample.
"""
import asyncio
import os
from pathlib import Path
from agent_framework import Agent, FileAccessProvider, FileSystemAgentFileStore
from agent_framework.foundry import FoundryChatClient
from azure.identity import AzureCliCredential
from dotenv import load_dotenv
# Load python/.env (python-dotenv walks up from this file by default). Pass
# override=True so values from .env take precedence over any pre-existing OS
# environment variables — without this, OS-level values silently win.
load_dotenv(override=True)
INSTRUCTIONS = """
You are a data analyst assistant. You have access to a folder of data files via
the file_access_* tools.
## Getting started
- Start by listing available files with file_access_list_files to see what data
is available.
- Read the files to understand their structure and contents.
## Working with data
- When asked to analyze data, read the relevant files first, then perform the
analysis.
- Show your analysis clearly with tables, summaries, and key insights.
- When calculations are needed, work through them step by step and show your
reasoning.
## Writing output
- When asked to produce output files (e.g., reports, summaries, filtered data),
use file_access_save_file to write them.
- Use appropriate file formats: CSV for tabular data, Markdown for reports.
- Confirm what you wrote and where.
## Important
- Never modify or delete the original input data files unless explicitly asked
to do so.
- If asked about data you haven't read yet, read it first before answering.
- Always explain your reasoning between tool calls so the user can follow along.
"""
PROMPTS = [
"What files do you have access to?",
"Read sales.csv and summarize what columns it contains and how many rows it has.",
"Calculate the total revenue (quantity * unit_price) per region and show the result as a table.",
(
"Save a markdown report named region_totals.md that contains the regional totals "
"and a one-paragraph summary of which region performed best."
),
"List the files again so I can confirm region_totals.md was created.",
]
async def main() -> None:
# 1. Resolve the working directory bundled alongside this script.
working_dir = Path(__file__).parent / "working"
# 2. Build the chat client.
client = FoundryChatClient(
project_endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"],
model=os.environ["FOUNDRY_MODEL"],
credential=AzureCliCredential(),
)
# 3. Wire up the file access provider against a file-system-backed store
# rooted at the sample's working/ folder. The provider injects its
# default instructions plus exposes five file_access_* tools to the
# agent for the duration of each run.
file_access = FileAccessProvider(store=FileSystemAgentFileStore(working_dir))
# 4. Create the agent and attach the provider.
async with Agent(
client=client,
name="DataAnalyst",
description="A data analyst assistant that reads, analyzes, and processes data files.",
instructions=INSTRUCTIONS,
context_providers=[file_access],
) as agent:
# 5. Run all prompts inside one session so the conversation remains
# coherent across turns.
session = agent.create_session()
for prompt in PROMPTS:
print(f"\nUser: {prompt}")
response = await agent.run(prompt, session=session)
print(f"Assistant: {response}")
# 6. Show the final folder contents so the side effects of the run are
# visible to the reader.
print("\nFinal contents of working/:")
for path in sorted(working_dir.iterdir()):
print(f" - {path.name} ({path.stat().st_size} bytes)")
if __name__ == "__main__":
asyncio.run(main())
# Sample output (truncated):
#
# User: What files do you have access to?
# Assistant: I can see one file in the working directory: sales.csv.
#
# User: Read sales.csv and summarize what columns it contains and how many rows it has.
# Assistant: sales.csv has 50 data rows and 7 columns: date, product, category,
# quantity, unit_price, region, salesperson.
#
# User: Calculate the total revenue (quantity * unit_price) per region and show the result as a table.
# Assistant:
# | Region | Total Revenue |
# |--------|---------------|
# | North | $X,XXX.XX |
# | South | $X,XXX.XX |
# | West | $X,XXX.XX |
#
# User: Save a markdown report named region_totals.md ...
# Assistant: I wrote region_totals.md to the working folder.
#
# User: List the files again so I can confirm region_totals.md was created.
# Assistant: The working folder now contains: region_totals.md, sales.csv.
#
# Final contents of working/:
# - region_totals.md (NNN bytes)
# - sales.csv (3175 bytes)
@@ -0,0 +1,50 @@
date,product,category,quantity,unit_price,region,salesperson
2025-01-03,Laptop Pro 15,Electronics,2,1299.99,North,Alice
2025-01-05,Ergonomic Chair,Furniture,5,349.50,South,Bob
2025-01-07,Wireless Mouse,Electronics,12,24.99,North,Alice
2025-01-08,Standing Desk,Furniture,1,599.00,West,Carol
2025-01-10,USB-C Hub,Electronics,8,45.99,North,David
2025-01-12,Monitor 27in,Electronics,3,429.00,South,Bob
2025-01-14,Desk Lamp,Furniture,6,79.95,West,Carol
2025-01-15,Keyboard Mech,Electronics,4,149.99,North,Alice
2025-01-17,Filing Cabinet,Furniture,2,189.00,South,David
2025-01-20,Webcam HD,Electronics,10,89.99,West,Bob
2025-01-22,Laptop Pro 15,Electronics,1,1299.99,South,Carol
2025-01-24,Ergonomic Chair,Furniture,3,349.50,North,Alice
2025-01-25,Notebook Pack,Stationery,20,12.99,South,David
2025-01-27,Wireless Mouse,Electronics,15,24.99,West,Carol
2025-01-28,Whiteboard,Stationery,4,129.00,North,Bob
2025-01-30,Standing Desk,Furniture,2,599.00,South,Alice
2025-02-02,USB-C Hub,Electronics,6,45.99,West,David
2025-02-04,Monitor 27in,Electronics,2,429.00,North,Carol
2025-02-05,Desk Lamp,Furniture,8,79.95,South,Bob
2025-02-07,Keyboard Mech,Electronics,5,149.99,West,Alice
2025-02-09,Filing Cabinet,Furniture,1,189.00,North,David
2025-02-11,Webcam HD,Electronics,7,89.99,South,Carol
2025-02-13,Laptop Pro 15,Electronics,3,1299.99,West,Bob
2025-02-15,Notebook Pack,Stationery,30,12.99,North,Alice
2025-02-17,Ergonomic Chair,Furniture,4,349.50,South,David
2025-02-19,Wireless Mouse,Electronics,20,24.99,North,Carol
2025-02-20,Whiteboard,Stationery,2,129.00,West,Bob
2025-02-22,Standing Desk,Furniture,1,599.00,North,Alice
2025-02-24,USB-C Hub,Electronics,10,45.99,South,David
2025-02-26,Monitor 27in,Electronics,4,429.00,West,Carol
2025-02-28,Desk Lamp,Furniture,3,79.95,North,Bob
2025-03-02,Keyboard Mech,Electronics,6,149.99,South,Alice
2025-03-04,Filing Cabinet,Furniture,3,189.00,West,David
2025-03-06,Webcam HD,Electronics,9,89.99,North,Carol
2025-03-08,Laptop Pro 15,Electronics,2,1299.99,South,Bob
2025-03-10,Notebook Pack,Stationery,25,12.99,West,Alice
2025-03-12,Ergonomic Chair,Furniture,6,349.50,North,David
2025-03-14,Wireless Mouse,Electronics,18,24.99,South,Carol
2025-03-15,Whiteboard,Stationery,5,129.00,North,Bob
2025-03-17,Standing Desk,Furniture,3,599.00,West,Alice
2025-03-19,USB-C Hub,Electronics,7,45.99,North,David
2025-03-21,Monitor 27in,Electronics,5,429.00,South,Carol
2025-03-23,Desk Lamp,Furniture,4,79.95,West,Bob
2025-03-25,Keyboard Mech,Electronics,3,149.99,North,Alice
2025-03-27,Filing Cabinet,Furniture,2,189.00,South,David
2025-03-28,Webcam HD,Electronics,11,89.99,West,Carol
2025-03-29,Laptop Pro 15,Electronics,1,1299.99,North,Bob
2025-03-30,Notebook Pack,Stationery,15,12.99,South,Alice
2025-03-31,Ergonomic Chair,Furniture,2,349.50,West,David
1 date product category quantity unit_price region salesperson
2 2025-01-03 Laptop Pro 15 Electronics 2 1299.99 North Alice
3 2025-01-05 Ergonomic Chair Furniture 5 349.50 South Bob
4 2025-01-07 Wireless Mouse Electronics 12 24.99 North Alice
5 2025-01-08 Standing Desk Furniture 1 599.00 West Carol
6 2025-01-10 USB-C Hub Electronics 8 45.99 North David
7 2025-01-12 Monitor 27in Electronics 3 429.00 South Bob
8 2025-01-14 Desk Lamp Furniture 6 79.95 West Carol
9 2025-01-15 Keyboard Mech Electronics 4 149.99 North Alice
10 2025-01-17 Filing Cabinet Furniture 2 189.00 South David
11 2025-01-20 Webcam HD Electronics 10 89.99 West Bob
12 2025-01-22 Laptop Pro 15 Electronics 1 1299.99 South Carol
13 2025-01-24 Ergonomic Chair Furniture 3 349.50 North Alice
14 2025-01-25 Notebook Pack Stationery 20 12.99 South David
15 2025-01-27 Wireless Mouse Electronics 15 24.99 West Carol
16 2025-01-28 Whiteboard Stationery 4 129.00 North Bob
17 2025-01-30 Standing Desk Furniture 2 599.00 South Alice
18 2025-02-02 USB-C Hub Electronics 6 45.99 West David
19 2025-02-04 Monitor 27in Electronics 2 429.00 North Carol
20 2025-02-05 Desk Lamp Furniture 8 79.95 South Bob
21 2025-02-07 Keyboard Mech Electronics 5 149.99 West Alice
22 2025-02-09 Filing Cabinet Furniture 1 189.00 North David
23 2025-02-11 Webcam HD Electronics 7 89.99 South Carol
24 2025-02-13 Laptop Pro 15 Electronics 3 1299.99 West Bob
25 2025-02-15 Notebook Pack Stationery 30 12.99 North Alice
26 2025-02-17 Ergonomic Chair Furniture 4 349.50 South David
27 2025-02-19 Wireless Mouse Electronics 20 24.99 North Carol
28 2025-02-20 Whiteboard Stationery 2 129.00 West Bob
29 2025-02-22 Standing Desk Furniture 1 599.00 North Alice
30 2025-02-24 USB-C Hub Electronics 10 45.99 South David
31 2025-02-26 Monitor 27in Electronics 4 429.00 West Carol
32 2025-02-28 Desk Lamp Furniture 3 79.95 North Bob
33 2025-03-02 Keyboard Mech Electronics 6 149.99 South Alice
34 2025-03-04 Filing Cabinet Furniture 3 189.00 West David
35 2025-03-06 Webcam HD Electronics 9 89.99 North Carol
36 2025-03-08 Laptop Pro 15 Electronics 2 1299.99 South Bob
37 2025-03-10 Notebook Pack Stationery 25 12.99 West Alice
38 2025-03-12 Ergonomic Chair Furniture 6 349.50 North David
39 2025-03-14 Wireless Mouse Electronics 18 24.99 South Carol
40 2025-03-15 Whiteboard Stationery 5 129.00 North Bob
41 2025-03-17 Standing Desk Furniture 3 599.00 West Alice
42 2025-03-19 USB-C Hub Electronics 7 45.99 North David
43 2025-03-21 Monitor 27in Electronics 5 429.00 South Carol
44 2025-03-23 Desk Lamp Furniture 4 79.95 West Bob
45 2025-03-25 Keyboard Mech Electronics 3 149.99 North Alice
46 2025-03-27 Filing Cabinet Furniture 2 189.00 South David
47 2025-03-28 Webcam HD Electronics 11 89.99 West Carol
48 2025-03-29 Laptop Pro 15 Electronics 1 1299.99 North Bob
49 2025-03-30 Notebook Pack Stationery 15 12.99 South Alice
50 2025-03-31 Ergonomic Chair Furniture 2 349.50 West David