Python: feat: add agent-framework-monty (Monty-backed CodeAct provider) (#5915)

* Python: feat: add agent-framework-monty (Monty-backed CodeAct)

New alpha package that wraps pydantic-monty (a Rust-based Python
interpreter) behind the same CodeAct API surface as
agent-framework-hyperlight, so users can swap providers with minimal
code change.

Public API (agent_framework_monty):
- MontyCodeActProvider — ContextProvider that injects a run-scoped
  execute_code tool plus dynamic CodeAct instructions.
- MontyExecuteCodeTool — standalone FunctionTool for mixed-tool agents
  or manual static wiring.
- FileMount / FileMountInput / MountMode — public types mirroring the
  Hyperlight names, with Monty's mode (read-only/read-write/overlay)
  and write_bytes_limit on FileMount.

Constructor kwargs (both classes) mirror Hyperlight where possible:
tools, approval_mode, workspace_root, file_mounts; plus a Monty-only
resource_limits forwarding ResourceLimits to Monty.start().

Filesystem flow:
- workspace_root auto-mounts at /input (read-write), matching Hyperlight.
- file_mounts accepts string shorthand, (host, mount) tuple, or
  FileMount with mode + write cap.
- Files written under read-write mounts are scanned post-execution and
  returned as Content.from_data items (mirrors Hyperlight /output).
- overlay mounts buffer writes in-memory; read-only mounts reject writes.

Internals:
- _monty_bridge.InlineCodeBridge ports the inline (non-durable) bridge
  from anthonychu/maf-codeact-monty-python; handles FunctionSnapshot /
  FutureSnapshot pause/resume, dispatches direct typed calls + the
  call_tool fallback, forwards mount/limits to Monty.start(...).
- generate_type_stubs emits per-tool stubs so Monty's `ty` type-checker
  rejects bad calls before any host tool runs.

Alpha-policy compliance (per python-package-management skill):
- Added agent-framework-monty = { workspace = true } to root
  pyproject.toml.
- Added row to python/PACKAGE_STATUS.md.
- Added monty entry under Experimental in python/AGENTS.md.
- NOT added to core[all]; NO agent_framework.monty lazy shim (deferred
  to beta promotion).

Samples (three sets, import from agent_framework_monty directly):
- samples/02-agents/context_providers/code_act/monty_code_act.py
  (provider pattern) + updated local README.
- samples/02-agents/tools/monty_code_interpreter/ (standalone +
  manual-wiring + README).
- samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/
  (full hosted-agent layout with uv-based pyproject.toml + Dockerfile,
  Azure Monitor wiring via APPLICATIONINSIGHTS_CONNECTION_STRING +
  enable_instrumentation, ENABLE_INSTRUMENTATION and
  ENABLE_SENSITIVE_DATA env vars). The alpha wheel is vendored into
  ./wheels/ (gitignored) via vendor-wheel.sh; new row added to the
  parent Responses-API README.

Tests:
- 28 hermetic unit tests (stubbed pydantic_monty).
- 18 integration tests marked @pytest.mark.integration, auto-skipped
  when pydantic_monty is unimportable; exercise the real Monty
  runtime: print round-trip, last-expression value, direct typed
  tool dispatch, call_tool fallback, async tool, asyncio.gather
  parallelism, ty type-check rejection, OS blocked by default,
  workspace_root read+write capture, read-only / overlay mount
  semantics, resource_limits.max_duration_secs abort, approval
  gating end-to-end, full Agent run with a scripted chat client.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix: monty FileMount test compares against the normalized POSIX path

The shorthand string mount goes through _normalize_mount_path, which
rewrites Windows drive letters like 'C:\\Users\\...' into
'/C:/Users/...' (POSIX-style). The Windows CI runners surfaced this
because tmp_path resolves to a backslashed Windows path; the test was
comparing against the raw str(host_a) instead of the normalized form.

Compare against _normalize_mount_path(str(host_a)) so the assertion is
platform-independent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix: address PR #5915 review feedback

- _execute_code_tool docstring: clarify that the Monty backend supports
  scoped filesystem access via workspace_root / file_mounts (blocked by
  default).
- _to_monty_mount: import pydantic_monty lazily through load_monty so
  missing-dependency errors surface as the same actionable RuntimeError
  the rest of the package raises (not a bare ImportError at module load).
  Renamed _load_monty -> load_monty for the same reason.
- _python_type_repr: emit None for type(None) instead of Any, and
  normalize both typing.Union[...] and PEP-604 X | Y to PEP-604 syntax
  so Optional[X] / Union[..., None] / -> None signatures round-trip
  correctly through ty validation. Added a regression test.
- _PrintCollector: track a running character count instead of
  recomputing sum(len(c) for c in self.chunks) per callback. Eliminates
  the O(n^2) cost on print-heavy code.
- Instructions: mention that the value of the final expression is also
  returned alongside captured stdout (matches actual behavior).
- 11_monty_codeact Dockerfile: pin ghcr.io/astral-sh/uv to 0.11.6
  instead of :latest for reproducible builds.
- 11_monty_codeact README: replace the bare "see parent README" pointer
  with sample-specific steps (./vendor-wheel.sh + uv sync + uv run),
  since the sample uses pyproject.toml + a vendored wheel rather than
  requirements.txt.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: sample: 11_monty_codeact installs agent-framework-monty from PyPI

Drop the vendored-wheel scaffolding now that agent-framework-monty is on
PyPI as an alpha (1.0.0a*) release:

- pyproject.toml: remove [tool.uv.sources] override; keep [tool.uv]
  prerelease = "allow" so uv pulls the alpha automatically.
- Dockerfile: drop the COPY wheels/ step.
- README: drop the ./vendor-wheel.sh setup step and the
  not-yet-on-PyPI warning.
- Delete vendor-wheel.sh and the gitignored wheels/ directory.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix(monty): harden post-execution file capture against symlink escape

Same class of issue as the MSRC-reported Hyperlight finding: the
post-execution capture walked workspace_root with Path.rglob() +
is_file() + read_bytes() - all of which follow symlinks. An attacker
who controls the workspace (cloned repo, extracted archive, shared
workspace) could pre-place `workspace/leak.txt -> /etc/passwd` or
`workspace/outside_dir -> /etc/` and have host files surface as
captured Content items.

Monty's mount layer already rejects symlink reads from inside the
sandbox across all three modes (verified empirically), so the runtime
path was safe. This commit closes the post-execution scan path.

Changes:
- New `_iter_real_files(root)` walker that uses iterdir() +
  is_symlink() to skip symlinks at every directory level and yields
  only real files. Replaces the previous `host_root.rglob("*")` calls
  in both `_snapshot_writable_mounts` and `_capture_written_files`.
- Use `Path.lstat()` instead of `Path.stat()` so size/mtime can never
  be taken from a symlink target.
- Three new integration tests reproducing the MSRC attack shape
  against the workspace_root flow: symlink-to-file outside workspace,
  symlink-to-directory outside workspace, and a guard ensuring
  legitimate sandbox writes are still captured when symlinks are
  present.

Per user request, hyperlight is untouched in this commit (separate fix).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix(monty): skip symlink regression tests when unsupported

Apply the same Windows-CI safety guard as the hyperlight fix in PR #5919:
the three symlink integration tests create symlinks via Path.symlink_to(),
which fails with OSError / NotImplementedError on unprivileged Windows
runners. Add a local _symlinks_supported helper (mirroring the one in
packages/core/tests/core/test_skills.py) and pytest.skip when symlinks
aren't available, so the tests no longer fail for environment reasons.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix(monty): address PR #5915 follow-up review feedback

- _invoke_tool: drop the inspect.iscoroutinefunction(...) branch and
  always `await self.tool_map[name](**kwargs)`. Every entry in
  tool_map is `partial(FunctionTool.invoke, skip_parsing=True)` and
  FunctionTool.invoke is `async def`, so the branching was dead code -
  and on Python versions affected by cpython#98590,
  iscoroutinefunction(partial(bound_async_method, ...)) returns False,
  causing the bridge to take the asyncio.to_thread path, return an
  unawaited coroutine, and surface it as a JSON-serialization failure
  for every tool call. Added a regression test
  test_invoke_tool_awaits_partial_wrapped_async_method.

- generate_type_stubs: skip tools whose name is not a valid Python
  identifier or is a Python keyword. FunctionTool.name has no upstream
  validation, so a name like "weird-name" produced a syntax error in
  the stubs and a name like "broken\n    pass\nasync def injected"
  would inject arbitrary stub source. Non-identifier names stay
  reachable via `call_tool("weird-name", ...)` at runtime; they just
  don't get type-checked stubs. Added regression test
  test_generate_type_stubs_skips_non_identifier_tool_names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Eduard van Valkenburg
2026-05-20 02:35:23 +02:00
committed by GitHub
Unverified
parent 4b0522d62d
commit 4609535e22
29 changed files with 3738 additions and 10 deletions
@@ -18,7 +18,8 @@ This directory contains samples that demonstrate how to use hosted [Agent Framew
| 8 | [Azure AI Search RAG](responses/08_azure_search_rag/) | An agent with Retrieval Augmented Generation (RAG) capabilities backed by Azure AI Search, grounding answers in documents indexed in a pre-provisioned search index. |
| 9 | [Foundry Skills](responses/09_foundry_skills/) | An agent that uploads `SKILL.md` files to the Foundry Skills REST API and downloads them at startup, decoupling tone/policy guidelines from agent code. |
| 10 | [Foundry Memory](responses/10_foundry_memory/) | An agent with persistent semantic memory backed by an Azure AI Foundry Memory Store, using `FoundryMemoryProvider` to remember user facts across sessions. |
| 11 | [Using deployed agent](responses/using_deployed_agent.py) | A sample demonstrating how to invoke an agent that has already been deployed to Foundry, showing how to interact with a hosted agent in code. |
| 11 | [Monty CodeAct](responses/11_monty_codeact/) | An agent with a Monty-backed CodeAct context provider, exposing a single `execute_code` tool that runs Python in a [pydantic-monty](https://github.com/pydantic/monty) interpreter and invokes typed host tools (`compute`, `fetch_data`) from inside the sandbox. Uses the alpha `agent-framework-monty` package. |
| 12 | [Using deployed agent](responses/using_deployed_agent.py) | A sample demonstrating how to invoke an agent that has already been deployed to Foundry, showing how to interact with a hosted agent in code. |
### Invocations API
@@ -0,0 +1,25 @@
FROM python:3.12-slim
# Bring in the `uv` binary from a pinned Astral image. Update this tag intentionally;
# `latest` would make rebuilds non-deterministic.
COPY --from=ghcr.io/astral-sh/uv:0.11.6 /uv /uvx /usr/local/bin/
ENV UV_LINK_MODE=copy \
UV_COMPILE_BYTECODE=1 \
UV_PROJECT_ENVIRONMENT=/app/.venv \
PATH="/app/.venv/bin:${PATH}"
WORKDIR /app
# Sync dependencies first to maximize Docker layer caching.
COPY pyproject.toml ./
RUN uv sync --no-install-project --no-cache
# Now copy the rest of the agent and finalize the environment.
COPY . ./
RUN uv sync --no-cache
EXPOSE 8088
CMD ["uv", "run", "--no-sync", "python", "main.py"]
@@ -0,0 +1,116 @@
# What this sample demonstrates
An [Agent Framework](https://github.com/microsoft/agent-framework) agent with a
**Monty-backed CodeAct context provider** hosted using the **Responses protocol**.
The model receives one tool (`execute_code`) and runs Python inside a
[Monty](https://github.com/pydantic/monty) interpreter; the registered host
tools (`compute`, `fetch_data`) are only reachable from inside the sandbox via
typed `await compute(...)` calls or the generic `call_tool(...)` fallback.
> [!NOTE]
> `agent-framework-monty` is an **alpha** package, so the `pyproject.toml`
> sets `[tool.uv] prerelease = "allow"` to let `uv sync` pick up the
> `1.0.0a*` release from PyPI.
## How It Works
### Model Integration
The agent uses `FoundryChatClient` to create a Responses client from the project
endpoint and the model deployment. The agent supports both streaming (SSE
events) and non-streaming (JSON) response modes.
See [main.py](main.py) for the full implementation.
### CodeAct context provider
`MontyCodeActProvider` is added to the agent via `context_providers=[...]`. On
every run it injects:
- An `execute_code` tool that runs Python in the Monty interpreter.
- Dynamic CodeAct instructions describing the available host tools and DSL.
The host tools (`compute`, `fetch_data`) are **not** exposed as direct agent
tools — the model can only call them from inside `execute_code`, either as
typed async functions (`await compute(operation="multiply", a=6, b=7)`) or via
the generic `call_tool("compute", operation="multiply", a=6, b=7)` fallback.
Code is type-checked against the host tool signatures using
[ty](https://docs.astral.sh/ty/) before any tool runs.
OS-level access (filesystem, network, subprocess) is blocked inside the
sandbox; the registered host tools retain full Python access.
### Observability
Agent Framework's [native OpenTelemetry instrumentation](https://learn.microsoft.com/en-us/agent-framework/agents/observability?pivots=programming-language-python) is enabled by setting these env vars in `agent.yaml` / `agent.manifest.yaml`:
- `ENABLE_INSTRUMENTATION=true` — turns on the framework's span/metric/log emitters.
- `ENABLE_SENSITIVE_DATA=true` — includes prompts, tool inputs, tool outputs, and completions in telemetry. **Dev/test only.**
`main.py` wires Azure Monitor at startup:
1. Reads `APPLICATIONINSIGHTS_CONNECTION_STRING` (Foundry hosting injects this automatically for the project's attached Application Insights resource; set it yourself when running locally).
2. Calls `azure.monitor.opentelemetry.configure_azure_monitor(connection_string=...)` to register Azure Monitor exporters with the global OTel tracer/meter/logger providers.
3. Calls `agent_framework.observability.enable_instrumentation()` so Agent Framework emits its `invoke_agent`, `chat`, `execute_tool`, and `execute_code` spans on those providers.
Trace linking happens automatically: the Foundry hosting layer's incoming `Responses` request becomes the **parent span**, and every framework / tool span (including the `execute_code` invocation that runs Monty) becomes a child via OpenTelemetry context propagation since both layers share the same global tracer provider. In Application Insights you can click any operation and see the full tree from inbound HTTP all the way down to individual `compute(...)` / `fetch_data(...)` calls inside the Monty sandbox.
## Running the Agent Host
This sample uses `pyproject.toml` + `uv sync` rather than the parent
README's `requirements.txt` flow. To run locally:
1. Install dependencies into a local virtual environment:
```bash
uv sync
```
2. Set the environment variables described in the
[parent README](../../README.md#running-the-agent-host-locally) (Foundry
project endpoint, model deployment, optional Application Insights), then
start the host:
```bash
uv run python main.py
```
Refer to the parent README for the shared `azd` / Docker / invocation /
deployment guidance.
## Interacting with the agent
> Depending on how you run the agent host, you can invoke the agent using
> `curl` (`Invoke-WebRequest` in PowerShell) or `azd`. Please refer to the
> [parent README](../../README.md) for more details. Use this README for
> sample queries you can send to the agent.
Send a POST request to the server with a JSON body containing an `"input"`
field. Try queries that benefit from combining Python with multiple tool calls:
```bash
curl -X POST http://localhost:8088/responses \
-H "Content-Type: application/json" \
-d '{"input": "Fetch all users, find the admins, then multiply the count by 7. Use a single execute_code call."}'
```
```bash
curl -X POST http://localhost:8088/responses \
-H "Content-Type: application/json" \
-d '{"input": "Compute the total price for one of every product in the products table. Use execute_code."}'
```
The model should respond with one `execute_code` call whose code looks like:
```python
users = await fetch_data(table="users")
admins = [u for u in users if u["role"] == "admin"]
result = await compute(operation="multiply", a=len(admins), b=7)
print(result)
```
## Deploying the Agent to Foundry
To host the agent on Foundry, follow the instructions in the
[Deploying the Agent to Foundry](../../README.md#deploying-the-agent-to-foundry)
section of the README in the parent directory.
@@ -0,0 +1,28 @@
name: agent-framework-agent-monty-codeact-responses
description: >
An Agent Framework agent with a Monty-backed CodeAct context provider hosted by Foundry.
metadata:
tags:
- Agent Framework
- AI Agent Hosting
- Azure AI AgentServer
- Responses Protocol
- CodeAct
- Monty
template:
name: agent-framework-agent-monty-codeact-responses
kind: hosted
protocols:
- protocol: responses
version: 1.0.0
environment_variables:
- name: AZURE_AI_MODEL_DEPLOYMENT_NAME
value: "{{AZURE_AI_MODEL_DEPLOYMENT_NAME}}"
- name: ENABLE_INSTRUMENTATION
value: "true"
- name: ENABLE_SENSITIVE_DATA
value: "true"
resources:
- kind: model
id: gpt-4.1-mini
name: AZURE_AI_MODEL_DEPLOYMENT_NAME
@@ -0,0 +1,15 @@
kind: hosted
name: agent-framework-agent-monty-codeact-responses
protocols:
- protocol: responses
version: 1.0.0
resources:
cpu: "0.25"
memory: 0.5Gi
environment_variables:
- name: AZURE_AI_MODEL_DEPLOYMENT_NAME
value: ${AZURE_AI_MODEL_DEPLOYMENT_NAME}
- name: ENABLE_INSTRUMENTATION
value: "true"
- name: ENABLE_SENSITIVE_DATA
value: "true"
@@ -0,0 +1,136 @@
# Copyright (c) Microsoft. All rights reserved.
import logging
import os
from typing import Annotated, Any, Literal
from agent_framework import Agent, tool
from agent_framework.foundry import FoundryChatClient
from agent_framework.observability import enable_instrumentation
from agent_framework_foundry_hosting import ResponsesHostServer
from agent_framework_monty import MontyCodeActProvider
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv
from pydantic import Field
# Load environment variables from .env file (no-op when injected by Foundry).
load_dotenv()
logger = logging.getLogger(__name__)
def _setup_telemetry() -> None:
"""Wire Agent Framework spans to the Application Insights resource attached to the Foundry project.
Foundry-hosted runtimes inject ``APPLICATIONINSIGHTS_CONNECTION_STRING`` automatically;
locally you can set it yourself (see README). When the connection string is present we
configure Azure Monitor OTel exporters once and then flip the framework's instrumentation
flag so it emits ``invoke_agent`` / ``chat`` / ``execute_tool`` spans. The hosting layer's
incoming-request span becomes the parent automatically via OpenTelemetry context
propagation when both layers share the same global tracer provider.
"""
connection_string = os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING")
if not connection_string:
logger.info(
"APPLICATIONINSIGHTS_CONNECTION_STRING is not set; Agent Framework spans will not "
"be exported to Azure Monitor. Set the env var to enable telemetry."
)
return
try:
from azure.monitor.opentelemetry import configure_azure_monitor
except ImportError:
logger.warning(
"azure-monitor-opentelemetry is not installed; skipping Azure Monitor setup. "
"Install it to export telemetry."
)
return
# Configure the global OTel providers (tracer/meter/logger) to export to Azure Monitor.
# Idempotent for repeated imports because we only call it from this entry point.
configure_azure_monitor(connection_string=connection_string)
# Flip the Agent Framework instrumentation flag so its spans are actually emitted on
# the now-configured global providers.
enable_instrumentation()
logger.info("Azure Monitor + Agent Framework instrumentation enabled.")
@tool(approval_mode="never_require")
def compute(
operation: Annotated[
Literal["add", "subtract", "multiply", "divide"],
Field(description="Math operation: add, subtract, multiply, or divide."),
],
a: Annotated[float, Field(description="First numeric operand.")],
b: Annotated[float, Field(description="Second numeric operand.")],
) -> float:
"""Perform a math operation used by sandboxed code."""
operations = {
"add": a + b,
"subtract": a - b,
"multiply": a * b,
"divide": a / b if b else float("inf"),
}
return operations[operation]
@tool(approval_mode="never_require")
def fetch_data(
table: Annotated[str, Field(description="Name of the simulated table to query.")],
) -> list[dict[str, Any]]:
"""Fetch simulated records from a named table."""
data: dict[str, list[dict[str, Any]]] = {
"users": [
{"id": 1, "name": "Alice", "role": "admin"},
{"id": 2, "name": "Bob", "role": "user"},
{"id": 3, "name": "Charlie", "role": "admin"},
],
"products": [
{"id": 101, "name": "Widget", "price": 9.99},
{"id": 102, "name": "Gadget", "price": 19.99},
],
}
return data.get(table, [])
def main() -> None:
"""Host a Monty CodeAct agent over the Responses protocol."""
# Set up telemetry BEFORE building the client/agent so the framework picks up
# the configured tracer provider when it lazily wires instrumentation.
_setup_telemetry()
client = FoundryChatClient(
project_endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"],
model=os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"],
credential=DefaultAzureCredential(),
)
# MontyCodeActProvider injects a sandboxed `execute_code` tool into every
# agent run, plus dynamic instructions describing the registered host tools.
# The host tools are hidden from the model - they can only be invoked from
# inside the sandbox (`await compute(...)` or `call_tool(...)`).
codeact = MontyCodeActProvider(
tools=[compute, fetch_data],
approval_mode="never_require",
)
agent = Agent(
client=client,
instructions=(
"You are a friendly assistant. Use `execute_code` to combine "
"Python control flow with the provided host tools whenever the "
"task requires lookups, transformations, or computation."
),
context_providers=[codeact],
# History will be managed by the hosting infrastructure, thus there
# is no need to store history by the service. Learn more at:
# https://developers.openai.com/api/reference/resources/responses/methods/create
default_options={"store": False},
)
server = ResponsesHostServer(agent)
server.run()
if __name__ == "__main__":
main()
@@ -0,0 +1,20 @@
[project]
name = "agent-framework-agent-monty-codeact-responses"
version = "0.1.0"
description = "Foundry-hosted Agent Framework agent with a Monty-backed CodeAct context provider."
requires-python = ">=3.12,<3.14"
dependencies = [
"agent-framework-foundry",
"agent-framework-foundry-hosting",
# agent-framework-monty is an alpha (1.0.0a*) release on PyPI.
"agent-framework-monty",
# Azure Monitor OpenTelemetry exporter; used to send agent telemetry to the
# Application Insights instance attached to the Foundry project.
"azure-monitor-opentelemetry",
]
[tool.uv]
# `agent-framework-monty` is an alpha package; allow the prerelease resolver
# to pick up 1.0.0a* releases from PyPI.
prerelease = "allow"