Python: Shell tool with support for local and Docker (#5664)

* feat(tools): add cross-OS LocalShellTool in new agent-framework-tools package

Introduces a safe, cross-OS local shell tool as the first citizen of a new

agent-framework-tools workspace package. Supports persistent (default) and

stateless modes across pwsh/powershell.exe/bash/sh, with policy denylist,

allowlist, approval gating, process-tree kill on timeout, output truncation,

and audit hooks. Integrates with existing provider get_shell_tool(func=...)

factories via FunctionTool kind='shell'.

See docs/decisions/0026-builtin-tools-local-shell.md for the full design.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(tools): security hardening for LocalShellTool

Codifies what LocalShellTool does and does not defend against, and

delegates the security-relevant lifecycle primitive to a battle-tested

library instead of hand-rolled per-OS code.

Changes:

- Adopt psutil for cross-OS process-tree termination (executor + session).

  Replaces hand-rolled taskkill/killpg with one canonical implementation.

- Resolve taskkill.exe to absolute %SystemRoot%\System32 path so PATH

  poisoning cannot redirect us to an attacker-supplied binary.

- Reframe ShellPolicy docstring + ADR + README: denylist is a guardrail,

  not a security boundary.

- Require acknowledge_unsafe=True to set approval_mode='never_require',

  making the unsafe path explicitly opt-in with a self-documenting name.

- Add tests/test_security.py codifying named CVE-style cases. Defenses

  we DO claim are asserted; non-defenses (denylist bypasses via

  backslash insertion, variable expansion, interpreter escape, base64,

  alternative tools, PowerShell-native verbs) are documented as

  expected-to-pass tests so residual risk stays visible.

- Add Threat Model + Confidence Strategy sections to ADR 0026.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(tools): add DockerShellTool sandboxed shell tier

Adds a container-backed shell executor as the recommended pattern for untrusted-input shell workflows. The container provides the security boundary (--network none, non-root user, --read-only, --cap-drop ALL, no-new-privileges, memory/pids limits, tmpfs /tmp), so approval gating is optional unlike LocalShellTool.

Also introduces a ShellExecutor Protocol so callers can plug in custom backends (Firecracker, SSH, WASI) without forking the framework.

Removes the planned HyperlightShellExecutor follow-up from ADR 0026: Hyperlight is a WASM code sandbox with no kernel/userland/shell binary, so a Hyperlight-backed shell is not viable. Docker is the realistic sandbox tier for shell.

Tests: 11 unit tests for argv builders + lifecycle (no Docker daemon required); 3 integration tests gated on is_docker_available().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(tools): backport shell-tool fixes from .NET parity review

Applies the applicable subset of bug fixes accumulated during the
.NET shell-tool PR review (microsoft/agent-framework#5604) to the
Python shell tool.

A1 - Quote workdir safely in _maybe_reanchor

  Previously _tool.py used double-quote interpolation when emitting
  the cd/Set-Location prefix, which expanded $VAR, $(), and backticks
  in the workdir path. A workdir containing shell metacharacters could
  trigger arbitrary command execution before the user command ran.

  Replaced with single-quote escaping helpers _quote_posix and
  _quote_powershell that emit literal-string forms safe for both
  hosts.

A5/A6 - Consolidate truncation to a single byte-aware helper

  Extracted a shared truncate_head_tail / truncate_text_head_tail
  helper in _truncate.py. The new implementation distributes odd
  caps so head receives floor(cap/2) and tail receives ceil(cap/2)
  bytes, matching the .NET round-9 fix and ensuring no input bytes
  are silently dropped on the boundary.

  _session.py previously truncated by Python str length while the
  caller passed _max_output_bytes - the unit mismatch is now gone:
  raw byte buffers go through truncate_head_tail and decoded text
  goes through truncate_text_head_tail.

Unit tests added for the truncate and quote helpers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(tools): tone down narrative and overconfident comments in shell tool

The shell tool's docstrings and comments contained two patterns that
the .NET review pushed back on:

- Narrative framing about implementation history ("hard-won",
  "we sidestep", "design inspiration: ...", competitor framework
  name-drops in module docstrings).
- Overstated security guarantees ("battle-tested",
  "reasonable for untrusted input", "recommended executor for any
  agent that runs commands from untrusted input",
  "destructive commands are blocked", "safe local shell tool",
  "blocks shell injection").

Rewrites the affected docstrings and comments to describe what the
code does in neutral terms. Behaviour is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(tools): add ShellEnvironmentProvider for the Python shell tool

Ports the .NET ShellEnvironmentProvider as a Python ContextProvider
so agents using LocalShellTool or DockerShellTool can be primed with
an accurate description of the shell they're talking to (family,
version, OS, working directory, and which CLIs are available).

The provider runs probes through any ShellExecutor, caches the
resulting snapshot, and on every before_run extends the session
instructions with a markdown block describing the shell idiom to
use. A failed first probe leaves the cache empty so the next call
retries (no permanent poisoning).

Probe failures from a narrow set of expected error types
(ShellCommandError, ShellExecutionError, ShellTimeoutError, and
asyncio.TimeoutError from the per-probe timeout) are recorded as
None fields in the snapshot. Other exceptions propagate. Tool
names are validated against ^[A-Za-z0-9._-]+$ before being
interpolated into a probe command.

Includes 12 unit tests covering happy path, stderr fallback,
timeout handling, expected/unexpected exception paths, malicious
tool name rejection, case-insensitive deduplication, retry after
failure, concurrent first-callers sharing one probe, and the
default and custom formatter paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(tools): document ShellEnvironmentProvider and finish comment cleanup

Add a README section introducing ShellEnvironmentProvider, soften two remaining overconfident security-boundary comments in _executor_base.py and the DockerShellTool class docstring, and add a sample (shell_with_environment_provider.py) that demonstrates the provider in stateless and persistent modes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(tools): move shell samples to python/samples/02-agents/tools

The repository convention is to host samples under python/samples/ rather than inside the package directory. Move the two net-new shell samples (allow-list and environment-provider) to python/samples/02-agents/tools/ and drop the in-package samples/ directory; the existing top-level providers/openai/client_with_local_shell.py already covers the basic LocalShellTool walkthrough.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(tools): cover confine_workdir default and ShellResult.format_for_model

Two new tests in test_local_shell_tool.py exercise the default confine_workdir=True behaviour on POSIX and PowerShell, asserting that 'cd' inside one persistent-mode call does not leak into the next. A new test_shell_result.py module provides direct unit coverage for every conditional branch of ShellResult.format_for_model (stdout, truncated, stderr, timed_out, exit_code) so regressions in the LLM-facing format are caught immediately.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(tools): address PR #5664 review feedback

- _tool.py: detect PowerShell via is_powershell() helper instead of basename string match

- _environment.py: use public ContextProvider import (no private _ prefix)

- _session.py: trim _stdout_buf/_stderr_buf after copying to avoid unbounded retention across calls

- _docker.py: short-circuit start()/close() in stateless mode; add configurable shell kwarg (default bash, e.g. 'sh' for alpine)

- tests: parenthesized multi-line assert; alpine integration tests now pass shell='sh'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(tools): satisfy CI quality gates

- pyupgrade: drop quoted self-class refs in __aenter__/method annotations

- ruff format: reflow long lines per workspace style

- pyright: assert psutil non-None in optional-import branch; lowercase mutable module globals; annotate _approval_mode as Literal so tool() Literal-typed kwarg is accepted; add ... body to ShellExecutor.run protocol; remove unused deprecated _kill_tree wrapper

- tests: skip docker integration tests on win32 (Windows containers don't support --read-only / alpine images)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove DEFAULT_DENYLIST; document single-session ownership; fix bandit findings

Mirrors the .NET PR #5604 cleanup:

- Remove DEFAULT_DENYLIST from ShellPolicy. ShellPolicy() now ships with an empty deny-list; operators opt into site-specific patterns explicitly. No major agent framework uses regex matching as a primary security control; AutoGen v2 removed theirs. Approval gating + sandbox tier remain the real boundaries.

- Rewrite module / class docstrings to frame ShellPolicy as a UX pre-filter, not a security control.

- Add Single-session ownership paragraphs to ShellExecutor, ShellSession, LocalShellTool, and DockerShellTool: a persistent-mode tool is owned by exactly one conversation / agent session; do not share across users or concurrent conversations.

- Tests now supply explicit deny patterns instead of relying on a default.

- Address Pre-commit Hooks (bandit) CI failures: convert internal-invariant asserts to explicit RuntimeError, annotate intentional subprocess/shell usage with # nosec, document container-internal /tmp paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR #5664 round-2 review feedback

Deny-list documentation drift:

- README and the OpenAI/local-shell sample no longer claim a built-in deny-list of destructive commands. ShellPolicy is described as an optional, operator-supplied UX pre-filter; the real boundaries remain approval gating and the sandbox tier.

Behavioural fixes called out in review:

- ShellPolicy.evaluate() now denies empty / whitespace-only commands explicitly instead of returning allow with no rationale.

- truncate_head_tail() raises ValueError for cap <= 0 instead of silently returning the full input with truncated=False, which previously could defeat output-capping in callers that mis-configured the budget.

- LocalShellTool.as_function() / DockerShellTool.as_function() return the ShellCommandError text directly so the model sees a single, non-redundant 'Command rejected by policy: …' message instead of the prior duplicated 'Command blocked by policy: Command rejected …' wrapping.

- ShellSession POSIX sentinel trailer now snapshots and restores the prior errexit (set -e) state around the trailer, so a user 'set -e' in the persistent shell is no longer permanently disabled by the next run().

Tests:

- New test_shell_parse_rc.py covers the full _parse_rc() edge-case surface (zero, positive, negative, CRLF, no newline, missing prefix, empty input, non-digits, trailing garbage, partial digits).

- test_policy.py asserts the new empty-command deny.

- test_shell_truncate_and_quote.py asserts ValueError for cap=0 and cap<0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review feedback for shell tool

- _resolve.py: reject empty/whitespace shell override string
- _tool.py / _docker.py: mode-aware default tool description (persistent vs stateless)
- _tool.py: fix misleading workdir docstring (re-anchor, not blocking)
- _types.py: emit stream-agnostic [output truncated] marker
- _policy.py: declare _denies/_allows as dataclass fields
- _environment.py: use $(pwd) instead of $PWD in POSIX probe

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review feedback: shell override flag + probe timeout safety

- _resolve.py: in stateless mode, ensure shell overrides end with -c/-Command so commands aren't misinterpreted as script-file paths.
- ShellExecutor.run / LocalShellTool.run / DockerShellTool.run now accept an optional 	imeout kwarg; ShellEnvironmentProvider drops the outer asyncio.wait_for and lets the executor enforce the probe timeout internally, so cancellation no longer risks leaving a hung subprocess or corrupted session.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback: docker isolation + lifecycle robustness

- pyproject.toml: bump agent-framework-core minimum from 1.2.0 to 1.2.2 to align with the rest of the workspace.
- _docker.py: validate extra_run_args at construction time and reject flags that would dismantle the isolation defaults (--privileged, --cap-add, --security-opt, --network/--net, -v/--volume/--mount, --device, --pid, --ipc, --userns, --user, --read-only, --tmpfs, --add-host, --gpus, --cgroupns, --device-cgroup-rule); also documented the warning on the docstring.
- _docker._stop_container: retry docker rm -f once and log a warning/error when it does not succeed, so operators can audit leaked containers instead of getting a silent success.
- _docker._run_stateless timeout path: fall back to docker rm -f when docker kill fails or times out (--rm only reaps on clean exit), and log instead of silently swallowing communicate() errors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: alliscode <bentho@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: alliscode <25218250+alliscode@users.noreply.github.com>
This commit is contained in:
Ben Thomas
2026-05-21 17:29:59 -07:00
committed by GitHub
Unverified
parent afd2739e38
commit 8e54f0b0e7
32 changed files with 4269 additions and 63 deletions
@@ -0,0 +1,14 @@
# Copyright (c) Microsoft. All rights reserved.
"""Built-in tools for the Microsoft Agent Framework."""
from __future__ import annotations
import importlib.metadata
try:
__version__ = importlib.metadata.version(__name__)
except importlib.metadata.PackageNotFoundError:
__version__ = "0.0.0"
__all__ = ["__version__"]
@@ -0,0 +1,53 @@
# Copyright (c) Microsoft. All rights reserved.
"""Cross-platform local shell tool for the Microsoft Agent Framework."""
from __future__ import annotations
from ._docker import (
DEFAULT_IMAGE as DOCKER_DEFAULT_IMAGE,
)
from ._docker import (
DockerNotAvailableError,
DockerShellTool,
is_docker_available,
)
from ._environment import (
ShellEnvironmentProvider,
ShellEnvironmentProviderOptions,
ShellEnvironmentSnapshot,
ShellFamily,
default_instructions_formatter,
)
from ._executor_base import ShellExecutor
from ._policy import ShellDecision, ShellPolicy, ShellRequest
from ._tool import LocalShellTool
from ._types import (
ShellCommandError,
ShellExecutionError,
ShellMode,
ShellResult,
ShellTimeoutError,
)
__all__ = [
"DOCKER_DEFAULT_IMAGE",
"DockerNotAvailableError",
"DockerShellTool",
"LocalShellTool",
"ShellCommandError",
"ShellDecision",
"ShellEnvironmentProvider",
"ShellEnvironmentProviderOptions",
"ShellEnvironmentSnapshot",
"ShellExecutionError",
"ShellExecutor",
"ShellFamily",
"ShellMode",
"ShellPolicy",
"ShellRequest",
"ShellResult",
"ShellTimeoutError",
"default_instructions_formatter",
"is_docker_available",
]
@@ -0,0 +1,700 @@
# Copyright (c) Microsoft. All rights reserved.
"""Sandboxed shell tool backed by a Docker (or compatible) container runtime.
``DockerShellTool`` exposes the same public surface as
:class:`LocalShellTool` but executes commands inside a container. The
container is intended to be the security boundary; effective isolation
depends on the host runtime configuration, image contents, and the flags
passed at launch.
Default flags applied at launch:
- ``--network none``: no host or external network access.
- ``--user 65534:65534``: runs as ``nobody:nogroup``.
- ``--read-only`` root filesystem; only the optional ``host_workdir``
mount is writable when ``mount_readonly=False``.
- ``--memory``, ``--pids-limit`` set to bounded values.
- ``--cap-drop=ALL`` and ``--security-opt=no-new-privileges``.
- ``--tmpfs /tmp`` so commands that need scratch space have somewhere to
write that doesn't escape the container.
Persistent mode reuses :class:`ShellSession` by launching
``docker exec -i <container> bash`` as the long-lived shell — the
sentinel protocol works unchanged because the session is still talking
to a bash REPL over pipes.
**Single-session ownership.** In persistent mode a
:class:`DockerShellTool` owns a long-lived container plus the bash REPL
inside it. The container's filesystem, environment, working directory,
and any artifacts the agent has produced are visible to every subsequent
command, and a single stdin/stdout pipe serializes every call. A
persistent-mode tool is therefore intended to be owned by exactly one
conversation / agent session — i.e. one user. Do not share one instance
across users, tenants, or concurrent conversations: their state leaks
together inside the container and commands queue behind each other.
Create one tool per session and close it (or use ``async with``) when
the session ends; closing stops and removes the container. If a shared
instance is genuinely required, use ``mode="stateless"`` so each call
gets its own throwaway ``docker run --rm``.
"""
from __future__ import annotations
import asyncio
import contextlib
import logging
import os
import secrets
import shutil
import subprocess # noqa: S404 # nosec B404 - running shell commands is the whole point of this tool
import time
from collections.abc import Callable, Mapping, Sequence
from typing import Literal
from agent_framework import FunctionTool, tool
from agent_framework._tools import SHELL_TOOL_KIND_VALUE
from ._policy import ShellPolicy, ShellRequest
from ._session import ShellSession
from ._truncate import truncate_head_tail as _truncate_bytes
from ._types import ShellCommandError, ShellMode, ShellResult
logger = logging.getLogger(__name__)
DEFAULT_IMAGE = "mcr.microsoft.com/azurelinux/base/core:3.0"
DEFAULT_CONTAINER_USER = "65534:65534" # nobody:nogroup on most distros
DEFAULT_NETWORK = "none"
DEFAULT_MEMORY = "512m"
DEFAULT_PIDS_LIMIT = 256
DEFAULT_WORKDIR = "/workspace"
# Docker run flags that would silently dismantle the isolation defaults
# (--cap-drop ALL, --security-opt no-new-privileges, --network none,
# --read-only root, pids/memory caps) if spliced in via ``extra_run_args``.
# These are rejected at construction time so the failure surfaces loudly
# rather than as a silently-unsandboxed container at runtime.
_BLOCKED_EXTRA_RUN_FLAGS: tuple[str, ...] = (
"--privileged",
"--cap-add",
"--security-opt",
"--network",
"--net",
"--volume",
"-v",
"--mount",
"--device",
"--device-cgroup-rule",
"--pid",
"--ipc",
"--userns",
"--user",
"--cgroupns",
"--add-host",
"--gpus",
"--read-only",
"--tmpfs",
)
def _validate_extra_run_args(args: Sequence[str]) -> None:
"""Reject extra ``docker run`` args that would break the isolation contract.
A caller can otherwise pass ``["--privileged"]``, ``["--network=host"]``,
``["-v", "/:/host:rw"]``, etc. and silently undo every isolation flag
this tool sets. The blocklist covers the obvious offenders; operators
that genuinely need one of these flags should subclass the tool or
build their own argv rather than slip past the check.
"""
bad: list[str] = []
for raw in args:
if not raw.startswith("-"):
continue
# Split off any "=value" tail so "--network=host" matches "--network".
flag = raw.split("=", 1)[0]
if flag in _BLOCKED_EXTRA_RUN_FLAGS:
bad.append(raw)
if bad:
raise ValueError(
"extra_run_args contains flags that would dismantle DockerShellTool's "
f"isolation defaults: {bad}. Override these via dedicated constructor "
"arguments (network, host_workdir, mount_readonly, read_only_root, "
"memory, pids_limit, user) or subclass the tool if you really need "
"to relax the sandbox."
)
class DockerNotAvailableError(RuntimeError):
"""Raised when the configured docker binary cannot be reached."""
def is_docker_available(binary: str = "docker") -> bool:
"""Return ``True`` if ``binary`` is on PATH and the daemon responds."""
if shutil.which(binary) is None:
return False
try:
out = subprocess.run( # noqa: S603 # nosec B603 - argv is built from trusted binary name
[binary, "version", "--format", "{{.Server.Version}}"],
capture_output=True,
timeout=5.0,
check=False,
)
except (OSError, subprocess.TimeoutExpired):
return False
return out.returncode == 0 and bool(out.stdout.strip())
# ---------------------------------------------------------------------------
# Pure argv builders. Kept side-effect-free so unit tests don't need Docker.
# ---------------------------------------------------------------------------
def build_run_argv(
*,
binary: str,
image: str,
container_name: str,
user: str,
network: str,
memory: str,
pids_limit: int,
workdir: str,
host_workdir: str | None,
mount_readonly: bool,
read_only_root: bool,
extra_env: Mapping[str, str] | None,
extra_args: Sequence[str] | None,
) -> list[str]:
"""Build the ``docker run -d`` argv that starts the long-lived container.
The container runs ``sleep infinity`` so it stays alive while the
session uses ``docker exec`` for individual commands.
"""
argv: list[str] = [
binary,
"run",
"-d",
"--rm",
"--name",
container_name,
"--user",
user,
"--network",
network,
"--memory",
memory,
"--pids-limit",
str(pids_limit),
"--cap-drop",
"ALL",
"--security-opt",
"no-new-privileges",
"--tmpfs",
"/tmp:rw,nosuid,nodev,size=64m", # noqa: S108, # nosec B108 - tmpfs inside the container, not on the host
"--workdir",
workdir,
]
if read_only_root:
argv.append("--read-only")
if host_workdir is not None:
ro = "ro" if mount_readonly else "rw"
argv.extend(["-v", f"{host_workdir}:{workdir}:{ro}"])
if extra_env:
for k, v in extra_env.items():
argv.extend(["-e", f"{k}={v}"])
if extra_args:
argv.extend(extra_args)
argv.extend([image, "sleep", "infinity"])
return argv
def build_exec_argv(
*,
binary: str,
container_name: str,
interactive: bool,
shell: str = "bash",
) -> list[str]:
"""Build the ``docker exec -i <container> <shell>`` argv.
For persistent mode this is the long-lived shell that
:class:`ShellSession` reads/writes via stdin/stdout pipes.
"""
argv = [binary, "exec", "-i", container_name, shell]
if not interactive:
# Stateless: ``docker exec`` is run per-command; <shell> -c <cmd> is
# appended later by run_stateless.
argv.extend(["-c"]) # caller appends the command
elif shell == "bash":
argv.extend(["--noprofile", "--norc"])
return argv
# ---------------------------------------------------------------------------
# DockerShellTool
# ---------------------------------------------------------------------------
_PERSISTENT_DESCRIPTION = (
"Execute a single shell command inside an isolated Docker container "
"and return its stdout, stderr, and exit code. Commands run in a "
"persistent session so `cd` and environment variables from previous "
"calls are preserved within the container."
)
_STATELESS_DESCRIPTION = (
"Execute a single shell command inside an isolated Docker container "
"and return its stdout, stderr, and exit code. Each command runs in a "
"fresh container, so `cd` and environment variables do not persist "
"between calls."
)
def _default_description(mode: ShellMode) -> str:
return _PERSISTENT_DESCRIPTION if mode == "persistent" else _STATELESS_DESCRIPTION
class DockerShellTool:
"""Shell tool that runs commands inside a Docker (or compatible) container.
**Single-session ownership.** In persistent mode this tool owns a
long-lived container plus the bash REPL inside it; it is intended to
be owned by a single conversation / agent session — i.e. one user.
Do not share one instance across users, tenants, or concurrent
conversations: state leaks together inside the container and commands
queue behind each other. Create one tool per session and close it
(or use ``async with``) when the session ends. If a shared instance
is genuinely required, use ``mode="stateless"``. See the module
docstring for more.
Args:
image: OCI image to run. Defaults to a small Microsoft-maintained
base image. Override with anything that includes ``bash`` and
(for persistent mode) ``sleep``.
container_name: Optional explicit name. When ``None`` a unique
name is generated per instance.
mode: ``"persistent"`` (default) keeps a single long-lived
container with `cd`/`export` carrying across calls.
``"stateless"`` runs each command in a fresh ``docker run --rm``.
host_workdir: Optional host directory to mount into the container
at ``workdir``. Mounted read-only by default; pass
``mount_readonly=False`` to allow writes.
workdir: Path inside the container. Default ``/workspace``.
mount_readonly: When ``True`` (default), mount ``host_workdir`` ro.
network: Docker network mode. Default ``"none"`` for no network.
memory: Container memory limit (e.g. ``"512m"``, ``"2g"``).
pids_limit: Max processes inside the container.
user: ``UID:GID`` to run as. Default ``65534:65534`` (nobody).
read_only_root: Mount the root filesystem read-only. Default ``True``.
extra_run_args: Additional args appended to ``docker run``.
.. warning::
This parameter can dismantle the tool's isolation
contract. Flags that would undo the default sandbox
(``--privileged``, ``--cap-add``, ``--security-opt``,
``--network``/``--net``, ``-v``/``--volume``,
``--mount``, ``--device``, ``--pid``, ``--ipc``,
``--userns``, ``--user``, ``--read-only``,
``--tmpfs``, ``--add-host``, ``--gpus``, ``--cgroupns``,
``--device-cgroup-rule``) are rejected at construction
time. Override the corresponding dedicated argument
(``network``, ``host_workdir``, ``mount_readonly``,
``read_only_root``, ``user``, etc.) instead. If you
genuinely need to relax the sandbox further, subclass
the tool — don't slip past this check.
env: Environment variables to set inside the container. These are
passed via ``-e`` and apply to every command.
policy: Optional :class:`ShellPolicy`. Less critical than for
``LocalShellTool`` since the container is the intended
isolation layer, but useful as a UX pre-filter (and for audit
logging). Defaults to an empty policy; supply patterns
explicitly to enable filtering.
timeout: Per-command timeout in seconds.
max_output_bytes: Combined stdout/stderr byte cap before truncation.
approval_mode: Controls the FunctionTool approval gate. Unlike
``LocalShellTool``, ``"never_require"`` is permitted without
``acknowledge_unsafe`` because the container — when launched
with the default isolation flags and a trusted runtime — is
the intended boundary rather than approval.
on_command: Audit hook fired for every allowed command.
docker_binary: Override (e.g. ``"podman"``).
shell: Shell binary to invoke inside the container. Defaults to
``"bash"``; pass ``"sh"`` for minimal images such as Alpine
that don't ship bash. Anything else must be present on
``$PATH`` inside the image.
"""
def __init__(
self,
*,
image: str = DEFAULT_IMAGE,
container_name: str | None = None,
mode: ShellMode = "persistent",
host_workdir: str | os.PathLike[str] | None = None,
workdir: str = DEFAULT_WORKDIR,
mount_readonly: bool = True,
network: str = DEFAULT_NETWORK,
memory: str = DEFAULT_MEMORY,
pids_limit: int = DEFAULT_PIDS_LIMIT,
user: str = DEFAULT_CONTAINER_USER,
read_only_root: bool = True,
extra_run_args: Sequence[str] | None = None,
env: Mapping[str, str] | None = None,
policy: ShellPolicy | None = None,
timeout: float | None = 30.0,
max_output_bytes: int = 64 * 1024,
approval_mode: Literal["always_require", "never_require"] = "always_require",
on_command: Callable[[str], None] | None = None,
docker_binary: str = "docker",
shell: str = "bash",
) -> None:
if mode not in ("persistent", "stateless"):
raise ValueError(f"mode must be 'persistent' or 'stateless', got {mode!r}")
_validate_extra_run_args(tuple(extra_run_args or ()))
self._image = image
self._container_name = container_name or f"af-shell-{secrets.token_hex(6)}"
self._mode: ShellMode = mode
self._host_workdir: str | None = os.fspath(host_workdir) if host_workdir is not None else None
self._workdir = workdir
self._mount_readonly = mount_readonly
self._network = network
self._memory = memory
self._pids_limit = pids_limit
self._user = user
self._read_only_root = read_only_root
self._extra_run_args = tuple(extra_run_args or ())
self._env = dict(env or {})
self._policy = policy or ShellPolicy()
self._timeout = timeout
self._max_output_bytes = max_output_bytes
self._approval_mode: Literal["always_require", "never_require"] = approval_mode
self._on_command = on_command
self._binary = docker_binary
self._shell = shell
self._session: ShellSession | None = None
self._container_started = False
self._lifecycle_lock: asyncio.Lock | None = None
def _get_lifecycle_lock(self) -> asyncio.Lock:
if self._lifecycle_lock is None:
self._lifecycle_lock = asyncio.Lock()
return self._lifecycle_lock
# ------------------------------------------------------------------ lifecycle
async def start(self) -> None:
"""Pull/start the container and (if persistent) the inner shell session."""
# Stateless mode never uses the long-lived container — every call goes
# through ``docker run --rm`` — so start()/close() are no-ops.
if self._mode == "stateless":
return
async with self._get_lifecycle_lock():
if self._container_started:
if self._session is not None:
await self._session.start()
return
await self._start_container()
self._container_started = True
argv = build_exec_argv(
binary=self._binary,
container_name=self._container_name,
interactive=True,
shell=self._shell, # nosec B604 - 'shell' is the binary name kwarg, not subprocess shell=True
)
self._session = ShellSession(
argv,
workdir=None, # workdir is set on the container itself
env=None,
max_output_bytes=self._max_output_bytes,
)
await self._session.start()
async def close(self) -> None:
"""Stop the inner shell session and tear down the container."""
if self._mode == "stateless":
return
async with self._get_lifecycle_lock():
if self._session is not None:
try:
await self._session.close()
finally:
self._session = None
if self._container_started:
await self._stop_container()
self._container_started = False
async def __aenter__(self) -> DockerShellTool:
await self.start()
return self
async def __aexit__(self, *_exc: object) -> None:
await self.close()
# ------------------------------------------------------------------ execution
async def run(self, command: str, *, timeout: float | None = None) -> ShellResult:
"""Execute ``command`` inside the container and return its result.
Args:
command: Shell command to execute.
timeout: Optional per-call timeout in seconds overriding the
tool's configured default. Enforced inside the executor
(kills the container / interrupts the bash REPL) so the
caller does not need to wrap the call in
:func:`asyncio.wait_for`.
"""
request = ShellRequest(command=command, workdir=self._workdir)
decision = self._policy.evaluate(request)
if decision.decision == "deny":
raise ShellCommandError(f"Command rejected by policy: {decision.reason}")
if self._on_command is not None:
try:
self._on_command(command)
except Exception:
logger.exception("on_command hook raised")
effective_timeout = self._timeout if timeout is None else timeout
if self._mode == "persistent":
if self._session is None:
await self.start()
if self._session is None:
raise RuntimeError("DockerShellTool session failed to start")
return await self._session.run(command, timeout=effective_timeout)
return await self._run_stateless(command, timeout=effective_timeout)
# ------------------------------------------------------------------ stateless
async def _run_stateless(self, command: str, *, timeout: float | None) -> ShellResult:
"""Run a single command in a fresh ``docker run --rm`` container."""
per_call_name = f"af-shell-{secrets.token_hex(6)}"
argv = [
self._binary,
"run",
"--rm",
"-i",
"--name",
per_call_name,
"--user",
self._user,
"--network",
self._network,
"--memory",
self._memory,
"--pids-limit",
str(self._pids_limit),
"--cap-drop",
"ALL",
"--security-opt",
"no-new-privileges",
"--tmpfs",
"/tmp:rw,nosuid,nodev,size=64m", # noqa: S108, # nosec B108 - tmpfs inside the container, not on the host
"--workdir",
self._workdir,
]
if self._read_only_root:
argv.append("--read-only")
if self._host_workdir is not None:
ro = "ro" if self._mount_readonly else "rw"
argv.extend(["-v", f"{self._host_workdir}:{self._workdir}:{ro}"])
for k, v in self._env.items():
argv.extend(["-e", f"{k}={v}"])
argv.extend(self._extra_run_args)
argv.extend([self._image, self._shell, "-c", command])
started = time.monotonic()
proc = await asyncio.create_subprocess_exec(
*argv,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
timed_out = False
try:
stdout_bytes, stderr_bytes = await asyncio.wait_for(proc.communicate(), timeout=timeout)
except asyncio.TimeoutError:
timed_out = True
# Kill the container by name. ``--rm`` reaps it on a clean
# exit; if ``docker kill`` itself hangs or returns non-zero
# (daemon stuck, container with unkillable kernel task) the
# ``--rm`` reaper never fires and the container is leaked,
# so we explicitly fall back to ``docker rm -f``.
killer = await asyncio.create_subprocess_exec(
self._binary,
"kill",
"--signal",
"KILL",
per_call_name,
stdout=asyncio.subprocess.DEVNULL,
stderr=asyncio.subprocess.DEVNULL,
)
kill_ok = False
try:
rc = await asyncio.wait_for(killer.wait(), timeout=5.0)
kill_ok = rc == 0
except asyncio.TimeoutError:
killer.kill()
with contextlib.suppress(Exception):
await killer.wait()
if not kill_ok:
logger.warning(
"docker kill of stateless container %s did not succeed; "
"attempting docker rm -f to avoid a container leak",
per_call_name,
)
reaper = await asyncio.create_subprocess_exec(
self._binary,
"rm",
"-f",
per_call_name,
stdout=asyncio.subprocess.DEVNULL,
stderr=asyncio.subprocess.PIPE,
)
try:
_, rerr = await asyncio.wait_for(reaper.communicate(), timeout=5.0)
if reaper.returncode != 0:
logger.error(
"docker rm -f %s failed (rc=%s, err=%s); container may be leaked",
per_call_name,
reaper.returncode,
rerr.decode("utf-8", errors="replace").strip(),
)
except asyncio.TimeoutError:
reaper.kill()
with contextlib.suppress(Exception):
await reaper.wait()
logger.error(
"docker rm -f %s timed out; container may be leaked",
per_call_name,
)
try:
stdout_bytes, stderr_bytes = await proc.communicate()
except Exception:
# Pipe/socket failures after kill leave us with no captured
# output. Log so operators can correlate empty results with
# the timeout teardown (otherwise the model just sees
# exit_code=-1 with no stdout/stderr).
logger.warning(
"failed to drain stdout/stderr after killing stateless container %s",
per_call_name,
exc_info=True,
)
stdout_bytes, stderr_bytes = b"", b""
duration_ms = int((time.monotonic() - started) * 1000)
stdout_str, stdout_truncated = _truncate_bytes(stdout_bytes or b"", self._max_output_bytes)
stderr_str, stderr_truncated = _truncate_bytes(stderr_bytes or b"", self._max_output_bytes)
return ShellResult(
stdout=stdout_str,
stderr=stderr_str,
exit_code=proc.returncode if proc.returncode is not None else -1,
duration_ms=duration_ms,
truncated=stdout_truncated or stderr_truncated,
timed_out=timed_out,
)
# ------------------------------------------------------------------ container ops
async def _start_container(self) -> None:
argv = build_run_argv(
binary=self._binary,
image=self._image,
container_name=self._container_name,
user=self._user,
network=self._network,
memory=self._memory,
pids_limit=self._pids_limit,
workdir=self._workdir,
host_workdir=self._host_workdir,
mount_readonly=self._mount_readonly,
read_only_root=self._read_only_root,
extra_env=self._env,
extra_args=self._extra_run_args,
)
proc = await asyncio.create_subprocess_exec(
*argv,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
out, err = await proc.communicate()
if proc.returncode != 0:
raise DockerNotAvailableError(
f"Failed to start container ({proc.returncode}): {err.decode('utf-8', errors='replace').strip()}"
)
logger.info(
"started docker container %s (id=%s)",
self._container_name,
out.decode("utf-8", errors="replace").strip()[:12],
)
async def _stop_container(self) -> None:
# Use docker rm -f for a hard shutdown. With --rm on the run
# command, this also reaps the container.
async def _attempt(timeout: float) -> tuple[int | None, str]:
"""Run a single ``docker rm -f`` attempt. Returns (rc, stderr)."""
proc = await asyncio.create_subprocess_exec(
self._binary,
"rm",
"-f",
self._container_name,
stdout=asyncio.subprocess.DEVNULL,
stderr=asyncio.subprocess.PIPE,
)
try:
_, err = await asyncio.wait_for(proc.communicate(), timeout=timeout)
return proc.returncode, err.decode("utf-8", errors="replace").strip()
except asyncio.TimeoutError:
proc.kill()
with contextlib.suppress(Exception):
await proc.wait()
return None, "docker rm -f timed out"
rc, err = await _attempt(timeout=10.0)
if rc == 0:
return
logger.warning(
"docker rm -f %s did not succeed on first attempt (rc=%s, err=%s); retrying",
self._container_name,
rc,
err,
)
rc, err = await _attempt(timeout=5.0)
if rc != 0:
# Container may have been leaked. Surface the name so an
# operator can clean it up manually.
logger.error(
"docker rm -f %s failed after retry (rc=%s, err=%s); "
"container may be running and require manual cleanup",
self._container_name,
rc,
err,
)
# ------------------------------------------------------------------ AF wiring
def as_function(
self,
*,
name: str = "run_shell",
description: str | None = None,
) -> FunctionTool:
"""Return a :class:`~agent_framework.FunctionTool` bound to this instance."""
async def _run_shell(command: str) -> str:
try:
result = await self.run(command)
except ShellCommandError as exc:
return str(exc)
return result.format_for_model()
effective_description = description or _default_description(self._mode)
_run_shell.__doc__ = effective_description
return tool(
func=_run_shell,
name=name,
description=effective_description,
approval_mode=self._approval_mode,
kind=SHELL_TOOL_KIND_VALUE,
)
@@ -0,0 +1,281 @@
# Copyright (c) Microsoft. All rights reserved.
"""Shell environment context provider.
Probes the underlying shell (OS, family/version, working directory,
configured CLI tools) once per provider lifetime and injects an
instructions block so the agent emits commands in the correct shell
idiom rather than defaulting to bash syntax inside a PowerShell session
or vice versa. The probe runs through any :class:`ShellExecutor`, so the
same provider works with both :class:`LocalShellTool` and
:class:`DockerShellTool`.
"""
from __future__ import annotations
import asyncio
import platform
import re
import sys
from collections.abc import Callable, Mapping, Sequence
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, ClassVar
from agent_framework import AgentSession, ContextProvider, SessionContext, SupportsAgentRun
from ._executor_base import ShellExecutor
from ._types import ShellCommandError, ShellExecutionError, ShellResult, ShellTimeoutError
class ShellFamily(str, Enum):
"""Shell families recognised by the provider."""
POSIX = "posix"
POWERSHELL = "powershell"
@dataclass(frozen=True)
class ShellEnvironmentSnapshot:
"""Point-in-time snapshot of the shell environment.
Attributes:
family: Detected (or configured) shell family.
os_description: A short OS description from :mod:`platform`.
shell_version: Reported shell version, or ``None`` when probing
failed or the shell did not report one.
working_directory: CWD reported by the shell, or empty string
when probing failed.
tool_versions: Map of probed CLI tool name to reported version.
``None`` values indicate the tool was not installed or did
not respond to ``--version`` within the probe timeout.
"""
family: ShellFamily
os_description: str
shell_version: str | None
working_directory: str
tool_versions: Mapping[str, str | None]
@dataclass(frozen=True)
class ShellEnvironmentProviderOptions:
"""Configuration for :class:`ShellEnvironmentProvider`.
Attributes:
probe_tools: CLI tools whose ``--version`` output is probed.
override_family: Optional override for the auto-detected family.
When ``None``, the family is inferred from :data:`sys.platform`
(Windows → PowerShell, otherwise POSIX). Set this when
running against a non-default shell (e.g. bash on Windows
via WSL, or pwsh on Linux).
probe_timeout: Per-probe execution timeout in seconds. Probes
that exceed this are recorded as missing rather than raised
to the agent.
instructions_formatter: Optional callable that renders the
snapshot as the instructions block. When ``None``, the
built-in :func:`default_instructions_formatter` is used.
"""
probe_tools: Sequence[str] = field(
default_factory=lambda: ("git", "node", "python", "docker"),
)
override_family: ShellFamily | None = None
probe_timeout: float = 5.0
instructions_formatter: Callable[[ShellEnvironmentSnapshot], str] | None = None
_TOOL_NAME_PATTERN = re.compile(r"^[A-Za-z0-9._-]+$")
def _detect_family() -> ShellFamily:
return ShellFamily.POWERSHELL if sys.platform == "win32" else ShellFamily.POSIX
def _first_non_empty_line(text: str) -> str | None:
for line in text.splitlines():
stripped = line.strip()
if stripped:
return stripped
return None
class ShellEnvironmentProvider(ContextProvider):
""":class:`ContextProvider` that injects a shell-environment block.
The provider runs a small set of probe commands against the supplied
:class:`ShellExecutor` once, caches the resulting
:class:`ShellEnvironmentSnapshot`, and on every ``before_run`` adds a
formatted instructions block to the session context. It does not
register any tools.
Probe failures from a narrow set of expected error types are recorded
as ``None`` fields in the snapshot (per-probe timeout, policy
rejection, executor spawn failure). Other exceptions propagate so
bugs are not silently swallowed.
A missing CLI never fails the agent: the model simply sees fewer
hints in its system prompt.
"""
DEFAULT_SOURCE_ID: ClassVar[str] = "shell_environment"
def __init__(
self,
executor: ShellExecutor,
options: ShellEnvironmentProviderOptions | None = None,
*,
source_id: str | None = None,
) -> None:
super().__init__(source_id or self.DEFAULT_SOURCE_ID)
self._executor = executor
self._options = options or ShellEnvironmentProviderOptions()
self._snapshot: ShellEnvironmentSnapshot | None = None
self._lock = asyncio.Lock()
@property
def current_snapshot(self) -> ShellEnvironmentSnapshot | None:
"""The most recent snapshot, or ``None`` before the first probe."""
return self._snapshot
async def refresh(self) -> ShellEnvironmentSnapshot:
"""Force a re-probe and replace the cached snapshot.
Useful when the agent has changed something the snapshot depends
on, e.g. installed a new CLI mid-session.
"""
async with self._lock:
snapshot = await self._probe()
self._snapshot = snapshot
return snapshot
async def before_run(
self,
*,
agent: SupportsAgentRun,
session: AgentSession,
context: SessionContext,
state: dict[str, Any],
) -> None:
snapshot = await self._get_or_probe()
formatter = self._options.instructions_formatter or default_instructions_formatter
context.extend_instructions(self.source_id, formatter(snapshot))
async def _get_or_probe(self) -> ShellEnvironmentSnapshot:
# Double-checked: return any already-cached snapshot without
# acquiring the lock; otherwise serialize the first probe so
# concurrent first-callers wait for a single result. A failed
# probe leaves _snapshot as None so the next call retries.
if self._snapshot is not None:
return self._snapshot
async with self._lock:
if self._snapshot is None:
self._snapshot = await self._probe()
return self._snapshot
async def _probe(self) -> ShellEnvironmentSnapshot:
family = self._options.override_family or _detect_family()
await self._executor.start()
shell_version, working_dir = await self._probe_shell_and_cwd(family)
tool_versions: dict[str, str | None] = {}
for tool in self._options.probe_tools:
# Skip case-insensitive duplicates so a caller passing
# ("git", "GIT") does not probe twice.
if tool.lower() in {existing.lower() for existing in tool_versions}:
continue
tool_versions[tool] = await self._probe_tool_version(tool)
return ShellEnvironmentSnapshot(
family=family,
os_description=platform.platform(),
shell_version=shell_version,
working_directory=working_dir,
tool_versions=tool_versions,
)
async def _probe_shell_and_cwd(self, family: ShellFamily) -> tuple[str | None, str]:
if family is ShellFamily.POWERSHELL:
command = (
'Write-Output ("VERSION=" + $PSVersionTable.PSVersion.ToString()); '
'Write-Output ("CWD=" + (Get-Location).Path)'
)
else:
command = 'echo "VERSION=${BASH_VERSION:-${ZSH_VERSION:-unknown}}"; echo "CWD=$(pwd)"'
result = await self._run_probe(command)
if result is None:
return None, ""
version: str | None = None
cwd = ""
for raw in result.stdout.splitlines():
line = raw.strip()
if line.startswith("VERSION="):
value = line[len("VERSION=") :].strip()
version = None if not value or value == "unknown" else value
elif line.startswith("CWD="):
cwd = line[len("CWD=") :].strip()
return version, cwd
async def _probe_tool_version(self, tool: str) -> str | None:
# Reject anything that is not a plain identifier — the tool name
# is interpolated into a shell command, so quotes, $, ;, |, &,
# whitespace, etc. would allow command injection if the tool list
# were sourced from untrusted input.
if not tool or not _TOOL_NAME_PATTERN.match(tool):
return None
result = await self._run_probe(f"{tool} --version")
if result is None or result.exit_code != 0:
return None
# Some CLIs (older java, gcc) emit --version on stderr.
line = _first_non_empty_line(result.stdout) or _first_non_empty_line(result.stderr)
return line if line else None
async def _run_probe(self, command: str) -> ShellResult | None:
try:
return await self._executor.run(command, timeout=self._options.probe_timeout)
except asyncio.TimeoutError:
return None
except (ShellCommandError, ShellExecutionError, ShellTimeoutError):
return None
def default_instructions_formatter(snapshot: ShellEnvironmentSnapshot) -> str:
"""Render ``snapshot`` as the default instructions block.
Public so callers that want to wrap or extend the default can call
it from a custom ``instructions_formatter``.
"""
lines: list[str] = ["## Shell environment"]
version_suffix = f" {snapshot.shell_version}" if snapshot.shell_version else ""
if snapshot.family is ShellFamily.POWERSHELL:
lines.append(f"You are operating a PowerShell{version_suffix} session on {snapshot.os_description}.")
lines.append("Use PowerShell idioms, NOT bash:")
lines.append("- Set environment variables with `$env:NAME = 'value'` (NOT `NAME=value`).")
lines.append("- Change directory with `Set-Location` or `cd`. Paths use `\\` separators.")
lines.append("- Reference environment variables as `$env:NAME` (NOT `$NAME`).")
lines.append("- The system temp directory is `[System.IO.Path]::GetTempPath()` (NOT `/tmp`).")
lines.append("- Pipe to `Out-Null` to suppress output (NOT `> /dev/null`).")
else:
lines.append(f"You are operating a POSIX shell{version_suffix} session on {snapshot.os_description}.")
lines.append("Use POSIX shell idioms (bash/sh).")
lines.append("- Set environment variables for the next command with `export NAME=value`.")
lines.append("- Reference environment variables as `$NAME` or `${NAME}`.")
lines.append("- Paths use `/` separators.")
if snapshot.working_directory:
lines.append(f"Working directory: {snapshot.working_directory}")
installed = [f"{name} ({version})" for name, version in snapshot.tool_versions.items() if version]
missing = [name for name, version in snapshot.tool_versions.items() if version is None]
if installed:
lines.append("Available CLIs: " + ", ".join(installed))
if missing:
lines.append("Not installed: " + ", ".join(missing))
return "\n".join(lines)
@@ -0,0 +1,93 @@
# Copyright (c) Microsoft. All rights reserved.
"""Stateless shell executor.
Each call to :func:`run_stateless` spawns a fresh subprocess, captures
stdout/stderr concurrently, enforces a timeout by killing the whole process
tree, and truncates oversized output. Matches the behaviour of AutoGen's
``LocalCommandLineCodeExecutor`` and OpenAI Agents SDK's ``local_shell``
protocol.
"""
from __future__ import annotations
import asyncio
import subprocess # noqa: S404 # nosec B404 - executing user shell commands is the whole point
import sys
import time
from collections.abc import Mapping, Sequence
from ._killtree import kill_process_tree
from ._resolve import is_powershell
from ._truncate import truncate_head_tail as _truncate
from ._types import ShellResult
def _popen_kwargs_for_group() -> dict[str, object]:
"""Platform-specific process-group isolation so we can kill children too."""
if sys.platform == "win32":
# CREATE_NEW_PROCESS_GROUP lets CTRL_BREAK_EVENT hit the whole group.
return {"creationflags": subprocess.CREATE_NEW_PROCESS_GROUP} # type: ignore[attr-defined]
return {"start_new_session": True}
async def run_stateless(
argv: Sequence[str],
command: str,
*,
workdir: str | None,
env: Mapping[str, str] | None,
timeout: float | None,
max_output_bytes: int,
) -> ShellResult:
"""Execute ``command`` via ``argv`` + ``-c``/``-Command`` + command.
Args:
argv: Base shell invocation (from :func:`resolve_shell` with
``interactive=False``).
command: User command string.
workdir: Working directory, or ``None`` to inherit.
env: Environment variables, or ``None`` to inherit the current
process environment.
timeout: Seconds before the process tree is killed; ``None`` disables.
max_output_bytes: Combined byte cap per stream before truncation.
"""
# For PowerShell we prepend a UTF-8 encoding preamble so powershell.exe
# on Windows (cp1252 by default) doesn't mojibake non-ASCII output.
if is_powershell(argv):
command = "$OutputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new($false); " + command
full_argv = [*argv, command]
started = time.monotonic()
proc = await asyncio.create_subprocess_exec(
*full_argv,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=workdir,
env=dict(env) if env is not None else None,
**_popen_kwargs_for_group(), # type: ignore[arg-type]
)
timed_out = False
try:
stdout_bytes, stderr_bytes = await asyncio.wait_for(proc.communicate(), timeout=timeout)
except asyncio.TimeoutError:
timed_out = True
await kill_process_tree(proc)
# Drain any queued output.
try:
stdout_bytes, stderr_bytes = await proc.communicate()
except Exception:
stdout_bytes, stderr_bytes = b"", b""
duration_ms = int((time.monotonic() - started) * 1000)
stdout_str, stdout_truncated = _truncate(stdout_bytes or b"", max_output_bytes)
stderr_str, stderr_truncated = _truncate(stderr_bytes or b"", max_output_bytes)
return ShellResult(
stdout=stdout_str,
stderr=stderr_str,
exit_code=proc.returncode if proc.returncode is not None else -1,
duration_ms=duration_ms,
truncated=stdout_truncated or stderr_truncated,
timed_out=timed_out,
)
@@ -0,0 +1,65 @@
# Copyright (c) Microsoft. All rights reserved.
"""Shell executor protocol.
A :class:`ShellExecutor` is the swappable backend for shell-tool execution.
``LocalShellTool`` runs commands directly on the host with no process-level
isolation; the approval-in-the-loop gate is the intended boundary.
``DockerShellTool`` runs commands inside a container — when the container
runtime is trusted and the default isolation flags are kept, the container
is the intended boundary instead of approval.
The protocol is intentionally minimal so callers can plug in their own
executor (e.g. a Firecracker microVM, a remote SSH host, a WASI runtime
that ships a busybox-WASM build) without forking the framework.
**Single-session ownership.** An executor instance — and the shell tool
that wraps it — is intended to serve a single conversation / agent session,
i.e. a single user. In persistent mode the executor owns a long-lived
shell process (and, for ``DockerShellTool``, a long-lived container) whose
state — working directory, exported variables, command history, in-flight
background jobs, files written to the container — is visible to every
subsequent command. A single stdin/stdout pipe serializes every call,
and the framework does not isolate one caller's state from another's.
Build one executor / one shell tool per session, treat it as owned by
that session for its lifetime, and close it when the session ends. Do
not share a persistent-mode instance across users, tenants, or concurrent
conversations. If a shared instance is genuinely required, construct the
shell tool with ``mode="stateless"`` so every call spawns a fresh process
or container.
"""
from __future__ import annotations
from typing import Protocol, runtime_checkable
from ._types import ShellResult
@runtime_checkable
class ShellExecutor(Protocol):
"""Async-context-manageable backend that runs shell commands."""
async def start(self) -> None:
"""Eagerly initialise the backend (no-op if already started)."""
async def close(self) -> None:
"""Tear down all backend resources. Idempotent."""
async def run(self, command: str, *, timeout: float | None = None) -> ShellResult:
"""Execute ``command`` and return its result.
Args:
command: The shell command to execute.
timeout: Optional per-call timeout in seconds. When ``None``,
the executor uses its configured default. Implementations
**must** enforce this timeout cancellation-safely (e.g.
kill the subprocess or tear down the session on timeout)
so callers can rely on the timeout to bound execution
without leaking processes on cancellation.
"""
...
async def __aenter__(self) -> ShellExecutor: ...
async def __aexit__(self, *exc: object) -> None: ...
@@ -0,0 +1,132 @@
# Copyright (c) Microsoft. All rights reserved.
"""Cross-OS process-tree termination.
Delegates to :mod:`psutil` for process introspection when available, with
a stdlib fallback. Tree-kill matters because a timed-out shell command can
spawn child processes (``make``, network tools, watchers, …); leaving
them running would defeat the timeout.
Notes:
* On Windows, ``taskkill.exe`` is resolved to its absolute system path so
a modified ``PATH`` cannot redirect the call to a different binary.
* psutil's ``Process.children(recursive=True)`` walks parent-child
relationships via OS APIs (``CreateToolhelp32Snapshot`` on Windows,
``/proc`` on Linux, ``proc_listpids`` on macOS), which is why it is
preferred over a hand-rolled platform-conditional implementation.
"""
from __future__ import annotations
import asyncio
import contextlib
import os
import signal
import sys
try: # pragma: no cover - importable on every platform we ship
import psutil # type: ignore[import-untyped]
_has_psutil = True
except ImportError: # pragma: no cover
_has_psutil = False
psutil = None # type: ignore[assignment]
_taskkill_path: str | None = None
def _resolve_taskkill() -> str:
"""Absolute path to taskkill.exe to defeat PATH poisoning."""
global _taskkill_path
if _taskkill_path is not None:
return _taskkill_path
system_root = os.environ.get("SystemRoot") or os.environ.get("SYSTEMROOT") or r"C:\Windows" # noqa: SIM112
candidate = os.path.join(system_root, "System32", "taskkill.exe")
_taskkill_path = candidate if os.path.isfile(candidate) else "taskkill"
return _taskkill_path
async def kill_process_tree(
proc: asyncio.subprocess.Process,
*,
grace: float = 2.0,
) -> None:
"""Terminate ``proc`` and all of its descendants. Best-effort, never raises."""
if proc.returncode is not None:
return
if _has_psutil:
await _kill_via_psutil(proc, grace=grace)
return
await _kill_via_stdlib(proc, grace=grace)
async def _kill_via_psutil(
proc: asyncio.subprocess.Process,
*,
grace: float,
) -> None:
if psutil is None:
raise RuntimeError("_kill_via_psutil called without psutil available")
try:
parent = psutil.Process(proc.pid)
except psutil.NoSuchProcess:
return
try:
descendants = parent.children(recursive=True)
except psutil.NoSuchProcess:
descendants = []
victims = [parent, *descendants]
# Phase 1: SIGTERM (or terminate() on Windows, which also asks nicely).
for v in victims:
with contextlib.suppress(psutil.NoSuchProcess, psutil.AccessDenied):
v.terminate()
# Wait briefly for graceful exit.
with contextlib.suppress(asyncio.TimeoutError):
await asyncio.wait_for(proc.wait(), timeout=grace)
# Phase 2: SIGKILL anything still alive.
for v in victims:
with contextlib.suppress(psutil.NoSuchProcess, psutil.AccessDenied):
if v.is_running():
v.kill()
with contextlib.suppress(asyncio.TimeoutError):
await asyncio.wait_for(proc.wait(), timeout=grace)
async def _kill_via_stdlib(
proc: asyncio.subprocess.Process,
*,
grace: float,
) -> None:
"""Fallback when psutil isn't installed. Less robust on Windows."""
if sys.platform == "win32":
try:
killer = await asyncio.create_subprocess_exec(
_resolve_taskkill(),
"/T",
"/F",
"/PID",
str(proc.pid),
stdout=asyncio.subprocess.DEVNULL,
stderr=asyncio.subprocess.DEVNULL,
)
with contextlib.suppress(asyncio.TimeoutError):
await asyncio.wait_for(killer.wait(), timeout=grace)
if killer.returncode is None:
killer.kill()
except (FileNotFoundError, OSError):
pass
with contextlib.suppress(ProcessLookupError, OSError):
proc.kill()
return
try:
os.killpg(os.getpgid(proc.pid), signal.SIGTERM)
with contextlib.suppress(asyncio.TimeoutError):
await asyncio.wait_for(proc.wait(), timeout=grace)
return
os.killpg(os.getpgid(proc.pid), signal.SIGKILL)
except (ProcessLookupError, PermissionError):
pass
@@ -0,0 +1,125 @@
# Copyright (c) Microsoft. All rights reserved.
r"""Policy model for :class:`LocalShellTool` and :class:`DockerShellTool`.
``ShellPolicy`` is evaluated *before* approval and *before* execution. It
lets callers define allow/deny rules and an optional final custom callback.
.. warning::
**Not a security boundary; not even a security feature.** ``ShellPolicy``
is a UX pre-filter: it gives operators a way to surface a friendly error
for site-specific patterns (e.g. "we don't run ``ssh`` from this agent",
"block our prod hostname") before approval and before execution. It is
**not** a defense against a malicious model or prompt-injected input.
Regex matching on the command spelling cannot see what the shell will
actually execute after expansion. Trivial bypasses include backslash
insertion (``r''m -rf /``), variable expansion (``${RM:=rm} -rf /``),
interpreter escape hatches (``python -c "import os; os.system('rm -rf /')"``),
base64 / hex / printf smuggling (``eval $(printf '\\x72\\x6d -rf /')``),
command substitution (``$(base64 -d <<<...)``), envvar splicing
(``$(A=r B=m; echo $A$B) -rf /``), and absolute paths
(``/usr/bin/rm`` matches ``\\brm\\b`` only when the pattern is loose).
**No default patterns.** ``ShellPolicy()`` constructs an empty deny-list.
The framework deliberately ships no built-in patterns so it does not
give a false impression of safety. Survey of competing agent frameworks
(LangChain, AutoGen, OpenAI Agents SDK, Claude Code, Goose, Continue.dev,
OpenHands, Open Interpreter, Aider, smolagents, LangGraph) found that
none use regex matching as a primary security control; AutoGen v2
explicitly removed their built-in deny-list.
The actual security boundary is **(a) approval-in-the-loop** (default
``approval_mode="always_require"``) and **(b) operator trust / sandbox
tier**. For untrusted input use ``DockerShellTool`` or
``HyperlightCodeActProvider`` (microVM); pair either with approval gating.
"""
from __future__ import annotations
import re
from collections.abc import Callable, Sequence
from dataclasses import dataclass, field
from typing import Literal, Union
PatternLike = Union[str, re.Pattern[str]]
@dataclass(frozen=True)
class ShellRequest:
"""A single command awaiting a policy decision."""
command: str
workdir: str | None = None
@dataclass(frozen=True)
class ShellDecision:
"""Result of a policy evaluation."""
decision: Literal["allow", "deny"]
reason: str = ""
def _compile_patterns(patterns: Sequence[PatternLike]) -> tuple[re.Pattern[str], ...]:
compiled: list[re.Pattern[str]] = []
for pat in patterns:
compiled.append(pat if isinstance(pat, re.Pattern) else re.compile(pat, re.IGNORECASE))
return tuple(compiled)
@dataclass
class ShellPolicy:
"""Layered allow/deny policy for shell commands.
Evaluation order (first hit wins):
1. ``denylist`` — if any pattern matches, the command is **denied**.
2. ``allowlist`` — if set and no pattern matches, the command is
**denied**. When ``allowlist`` is ``None`` the allow rule is skipped.
3. ``custom`` — user-supplied callback gets the final say and may return
a :class:`ShellDecision` to override allow/deny outcomes.
4. Otherwise the command is **allowed**.
All regex patterns are compiled case-insensitively.
Defaults are empty: ``ShellPolicy()`` allows every non-empty command.
Supply ``denylist`` and/or ``allowlist`` explicitly to enable filtering.
See the module docstring for why the framework does not ship default
deny patterns.
"""
denylist: Sequence[PatternLike] = field(default_factory=tuple)
allowlist: Sequence[PatternLike] | None = None
custom: Callable[[ShellRequest], ShellDecision | None] | None = None
_denies: tuple[re.Pattern[str], ...] = field(init=False, repr=False, compare=False)
_allows: tuple[re.Pattern[str], ...] | None = field(init=False, repr=False, compare=False)
def __post_init__(self) -> None:
self._denies = _compile_patterns(self.denylist)
self._allows = _compile_patterns(self.allowlist) if self.allowlist is not None else None
def evaluate(self, request: ShellRequest) -> ShellDecision:
"""Return an allow/deny decision for ``request``.
Empty/whitespace-only commands are denied (there is nothing to
run). With default settings (no denylist, no allowlist) every
non-empty command is allowed.
"""
command = request.command.strip()
if not command:
return ShellDecision("deny", "command is empty")
for pat in self._denies:
if pat.search(command):
return ShellDecision("deny", f"matches denylist pattern: {pat.pattern}")
if self._allows is not None and not any(pat.search(command) for pat in self._allows):
return ShellDecision("deny", "command does not match allowlist")
if self.custom is not None:
override = self.custom(request)
if override is not None:
return override
return ShellDecision("allow")
def evaluate_command(self, command: str) -> ShellDecision:
"""Convenience: evaluate a bare command with no workdir context."""
return self.evaluate(ShellRequest(command=command))
@@ -0,0 +1,111 @@
# Copyright (c) Microsoft. All rights reserved.
"""Cross-platform shell discovery for :class:`LocalShellTool`."""
from __future__ import annotations
import os
import shutil
import sys
from collections.abc import Sequence
from ._types import ShellExecutionError
_ENV_OVERRIDE = "AGENT_FRAMEWORK_SHELL"
def resolve_shell(shell: str | Sequence[str] | None, *, interactive: bool) -> list[str]:
"""Resolve the shell invocation argv.
Priority:
1. Explicit ``shell`` argument (string is split via shlex rules; sequence
is used verbatim).
2. ``AGENT_FRAMEWORK_SHELL`` environment variable.
3. Platform default.
When ``interactive=False`` (stateless mode), the returned argv is
guaranteed to end with a ``-c`` (POSIX) or ``-Command`` (PowerShell)
flag so the caller can append a command string verbatim. Overrides
that already include the flag are left as-is.
Args:
shell: Optional override supplied by the caller.
interactive: When ``True`` (persistent mode), the returned argv is
suitable for a long-lived session that reads commands from
``stdin``. When ``False`` (stateless mode), the caller will
append a command string to this argv (so the argv must end
with ``-c`` / ``-Command``).
"""
if shell is not None:
if isinstance(shell, str):
import shlex
parts = shlex.split(shell)
if not parts:
raise ShellExecutionError("shell override must not be empty")
else:
parts = list(shell)
if not parts:
raise ShellExecutionError("shell override must not be empty")
return parts if interactive else _ensure_command_flag(parts)
env_override = os.environ.get(_ENV_OVERRIDE)
if env_override:
import shlex
parts = shlex.split(env_override)
if parts:
return parts if interactive else _ensure_command_flag(parts)
if sys.platform == "win32":
binary = shutil.which("pwsh") or shutil.which("powershell")
if binary is None:
raise ShellExecutionError(
f"Neither 'pwsh' nor 'powershell' was found on PATH. Install PowerShell 7+ or set {_ENV_OVERRIDE}."
)
if interactive:
# Interactive persistent session reads from stdin via '-'.
return [binary, "-NoLogo", "-NoProfile", "-NonInteractive", "-Command", "-"]
return [binary, "-NoLogo", "-NoProfile", "-NonInteractive", "-Command"]
for candidate in ("/bin/bash", "/usr/bin/bash", "/bin/sh", "/usr/bin/sh"):
if os.path.exists(candidate):
if interactive:
return [candidate, "--noprofile", "--norc"] if candidate.endswith("bash") else [candidate]
return [candidate, "-c"]
# Last-ditch fallback: let PATH resolve 'sh'.
sh = shutil.which("sh")
if sh is None:
raise ShellExecutionError(f"No POSIX shell found on PATH. Set {_ENV_OVERRIDE} to override.")
return [sh] if interactive else [sh, "-c"]
def is_powershell(argv: Sequence[str]) -> bool:
"""Return True when ``argv[0]`` appears to be PowerShell."""
if not argv:
return False
name = os.path.basename(argv[0]).lower()
return name in {"pwsh", "pwsh.exe", "powershell", "powershell.exe"}
def _ensure_command_flag(argv: list[str]) -> list[str]:
"""Append the right ``-c`` / ``-Command`` flag for stateless argv.
The caller (``run_stateless``) appends the user's command string
verbatim to this argv. If a user-supplied override omits the
``-c`` / ``-Command`` flag, the command would be misinterpreted
(POSIX shells treat the next positional arg as a script file
path). This helper normalises overrides so they execute correctly
in stateless mode.
"""
if not argv:
return argv
last = argv[-1].lower()
if is_powershell(argv):
if last in {"-command", "-c"}:
return argv
return [*argv, "-Command"]
if last == "-c":
return argv
return [*argv, "-c"]
@@ -0,0 +1,443 @@
# Copyright (c) Microsoft. All rights reserved.
"""Persistent shell session.
A :class:`ShellSession` launches a long-lived shell subprocess and executes
commands one at a time by writing them to stdin followed by a **sentinel
probe** that reports the exit status. Reading stdout until the sentinel
appears gives a reliable command boundary without relying on job control
or PTYs, which keeps the same code path working on bash, sh, and pwsh.
**Single-owner contract.** A :class:`ShellSession` is owned by exactly one
conversation / agent session — i.e. one user. The backing shell process
carries mutable state (cwd, exported variables, history, background jobs)
that every subsequent command can observe, and the internal
``asyncio.Lock`` serializes every call onto the single stdin/stdout pipe.
There is no per-caller isolation. The enclosing shell tool must not share
a single session across users, tenants, or concurrent conversations; it
must create one session per agent session and close it when the session
ends.
Notes:
* ``pwsh -NoProfile -NoLogo -NonInteractive -Command -`` waits for a
complete parse before executing, so multi-line ``try`` blocks stall
with stdin open. To avoid that, the user command is base64-encoded
and invoked with ``Invoke-Expression`` on a single line.
* ``Write-Output`` routes through the PowerShell pipeline formatter,
which may drop trailing newlines when stdout is redirected. The
sentinel is emitted via ``[Console]::WriteLine`` followed by an
explicit ``[Console]::Out.Flush()``.
* ``$LASTEXITCODE`` only tracks external-process exits, so the rc is
also derived from ``$?`` and caught exceptions.
* stdout and stderr are consumed by **persistent reader tasks** that
run for the lifetime of the session. Each ``run()`` snapshots buffer
offsets before writing the command and scans forward from there.
This avoids ``read() called while another coroutine is already
waiting`` errors from per-call ``wait_for(stream.read())`` loops and
prevents late stderr from being attributed to the next command.
"""
from __future__ import annotations
import asyncio
import base64
import os
import secrets
import signal
import sys
import time
from collections.abc import Mapping, Sequence
from ._killtree import kill_process_tree
from ._resolve import is_powershell
from ._truncate import truncate_head_tail as _truncate_bytes
from ._truncate import truncate_text_head_tail as _truncate_text
from ._types import ShellResult
_READ_CHUNK = 64 * 1024
_SHUTDOWN_GRACE = 2.0
# Extra grace window after the sentinel arrives to let late stderr drain.
_STDERR_QUIESCENCE = 0.05
class ShellSession:
"""A long-lived shell subprocess that executes commands via sentinels.
The session is thread-unsafe by design but async-safe: concurrent calls
to :meth:`run` are serialised with an internal :class:`asyncio.Lock`.
"""
def __init__(
self,
argv: Sequence[str],
*,
workdir: str | None = None,
env: Mapping[str, str] | None = None,
max_output_bytes: int = 64 * 1024,
) -> None:
self._argv = list(argv)
self._workdir = workdir
self._env = dict(env) if env is not None else None
self._max_output_bytes = max_output_bytes
self._proc: asyncio.subprocess.Process | None = None
# Serialises per-command execution onto the single stdin/stdout
# pipe. This is an ordering primitive within one owning session;
# it is NOT a multi-tenant isolation mechanism. ShellSession is
# single-owner — see the module docstring. The lock just
# guarantees concurrent calls from the one owner queue cleanly
# instead of interleaving on the pipe.
self._run_lock = asyncio.Lock()
# Serialises start/close so concurrent first-callers don't spawn
# multiple subprocesses.
self._lifecycle_lock = asyncio.Lock()
self._sentinel_tag = secrets.token_hex(8)
self._is_pwsh = is_powershell(argv)
# Persistent reader state. The reader tasks append into these
# buffers; _run_locked scans forward from a per-call offset.
self._stdout_buf = bytearray()
self._stderr_buf = bytearray()
self._stdout_event = asyncio.Event()
self._stdout_reader: asyncio.Task[None] | None = None
self._stderr_reader: asyncio.Task[None] | None = None
self._stdout_closed = False
# ------------------------------------------------------------------ lifecycle
async def start(self) -> None:
"""Spawn the shell if it isn't already running."""
async with self._lifecycle_lock:
if self._proc is not None and self._proc.returncode is None:
return
popen_kwargs: dict[str, object] = {}
if sys.platform == "win32":
import subprocess # noqa: S404 # nosec B404 - Win32 constants only
popen_kwargs["creationflags"] = subprocess.CREATE_NEW_PROCESS_GROUP # type: ignore[attr-defined]
else:
popen_kwargs["start_new_session"] = True
self._proc = await asyncio.create_subprocess_exec(
*self._argv,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=self._workdir,
env=self._env,
**popen_kwargs, # type: ignore[arg-type]
)
# Reset buffer state in case this is a restart after close().
self._stdout_buf.clear()
self._stderr_buf.clear()
self._stdout_event = asyncio.Event()
self._stdout_closed = False
if self._proc.stdout is None or self._proc.stderr is None:
raise RuntimeError("subprocess pipes were not created; stdout/stderr unavailable")
self._stdout_reader = asyncio.create_task(self._reader(self._proc.stdout, self._stdout_buf, is_stdout=True))
self._stderr_reader = asyncio.create_task(
self._reader(self._proc.stderr, self._stderr_buf, is_stdout=False)
)
# Best-effort: make PowerShell emit UTF-8 and fail loudly on errors.
if self._is_pwsh:
await self._write_raw(
"$OutputEncoding = [Console]::OutputEncoding = "
"[System.Text.UTF8Encoding]::new($false);"
"$ErrorActionPreference = 'Stop'\n"
)
async def close(self) -> None:
"""Terminate the shell cleanly, falling back to SIGKILL."""
async with self._lifecycle_lock:
proc = self._proc
self._proc = None
if proc is None or proc.returncode is not None:
await self._cancel_readers()
return
try:
if proc.stdin is not None and not proc.stdin.is_closing():
try:
proc.stdin.write(b"exit\n")
await proc.stdin.drain()
proc.stdin.close()
except (ConnectionResetError, BrokenPipeError):
pass
try:
await asyncio.wait_for(proc.wait(), timeout=_SHUTDOWN_GRACE)
except asyncio.TimeoutError:
await kill_process_tree(proc, grace=_SHUTDOWN_GRACE)
except Exception: # nosec B110 - best-effort shutdown; falls through to forced kill in finally
pass
finally:
await self._cancel_readers()
async def _cancel_readers(self) -> None:
for t in (self._stdout_reader, self._stderr_reader):
if t is not None and not t.done():
t.cancel()
try:
await t
except (asyncio.CancelledError, Exception):
pass
self._stdout_reader = None
self._stderr_reader = None
async def __aenter__(self) -> ShellSession:
await self.start()
return self
async def __aexit__(self, *_exc: object) -> None:
await self.close()
# ------------------------------------------------------------------ execution
async def run(
self,
command: str,
*,
timeout: float | None,
) -> ShellResult:
"""Run ``command`` in the live session and return its result."""
await self.start()
async with self._run_lock:
return await self._run_locked(command, timeout=timeout)
async def _run_locked(self, command: str, *, timeout: float | None) -> ShellResult:
if self._proc is None or self._proc.stdin is None:
raise RuntimeError("ShellSession is not running; call start() first")
sentinel = f"__AF_END_{self._sentinel_tag}_{secrets.token_hex(4)}__"
script = self._build_script(command, sentinel)
# Snapshot current buffer positions so we only attribute output
# produced *after* the command is submitted.
stdout_offset = len(self._stdout_buf)
stderr_offset = len(self._stderr_buf)
self._stdout_event.clear()
started = time.monotonic()
try:
self._proc.stdin.write(script.encode("utf-8"))
await self._proc.stdin.drain()
except (ConnectionResetError, BrokenPipeError) as exc:
raise RuntimeError("persistent shell session is no longer alive") from exc
needle = sentinel.encode("utf-8")
timed_out = False
hard_cap = self._max_output_bytes * 4
async def _wait_for_sentinel() -> tuple[int, int]:
"""Return (sentinel_start_index, exit_code) once seen."""
while True:
idx = self._stdout_buf.find(needle, stdout_offset)
if idx != -1:
# Parse trailing ``_<digits>``.
tail_start = idx + len(needle)
# Wait briefly for the rc digits + newline to arrive.
deadline = time.monotonic() + 1.0
while time.monotonic() < deadline:
after = bytes(self._stdout_buf[tail_start:])
nl = after.find(b"\n")
if nl != -1:
break
if self._stdout_closed:
break
self._stdout_event.clear()
try:
await asyncio.wait_for(self._stdout_event.wait(), timeout=0.1)
except asyncio.TimeoutError:
pass
after = bytes(self._stdout_buf[tail_start:])
rc = _parse_rc(after)
return idx, rc
if self._stdout_closed:
raise RuntimeError("shell closed stdout before emitting sentinel")
if len(self._stdout_buf) - stdout_offset > hard_cap:
raise _SentinelOverflow
self._stdout_event.clear()
try:
await asyncio.wait_for(self._stdout_event.wait(), timeout=0.5)
except asyncio.TimeoutError:
# Keep spinning; timeout is enforced at the wait_for below.
pass
sentinel_idx: int
exit_code: int
try:
sentinel_idx, exit_code = await asyncio.wait_for(_wait_for_sentinel(), timeout=timeout)
except asyncio.TimeoutError:
timed_out = True
await self._interrupt_current_command()
try:
sentinel_idx, exit_code = await asyncio.wait_for(_wait_for_sentinel(), timeout=_SHUTDOWN_GRACE)
except (asyncio.TimeoutError, RuntimeError, _SentinelOverflow):
# Session is unrecoverable; tear it down so the next call
# gets a fresh subprocess.
await self.close()
duration_ms = int((time.monotonic() - started) * 1000)
stdout_bytes = bytes(self._stdout_buf[stdout_offset:])
stderr_bytes = bytes(self._stderr_buf[stderr_offset:])
stdout_str, so_trunc = _truncate_bytes(stdout_bytes, self._max_output_bytes)
stderr_str, se_trunc = _truncate_bytes(stderr_bytes, self._max_output_bytes)
return ShellResult(
stdout=stdout_str,
stderr=stderr_str,
exit_code=124,
duration_ms=duration_ms,
truncated=so_trunc or se_trunc,
timed_out=True,
)
except _SentinelOverflow:
# Runaway output; recover by interrupting and restarting.
await self._interrupt_current_command()
await self.close()
duration_ms = int((time.monotonic() - started) * 1000)
stdout_bytes = bytes(self._stdout_buf[stdout_offset : stdout_offset + hard_cap])
stderr_bytes = bytes(self._stderr_buf[stderr_offset:])
stdout_str, _ = _truncate_bytes(stdout_bytes, self._max_output_bytes)
stderr_str, _ = _truncate_bytes(stderr_bytes, self._max_output_bytes)
return ShellResult(
stdout=stdout_str,
stderr=stderr_str,
exit_code=-1,
duration_ms=duration_ms,
truncated=True,
timed_out=False,
)
# Let stderr quiesce briefly — late writes from the completing
# command otherwise leak into the *next* run().
await asyncio.sleep(_STDERR_QUIESCENCE)
duration_ms = int((time.monotonic() - started) * 1000)
stdout_raw = bytes(self._stdout_buf[stdout_offset:sentinel_idx])
stderr_raw = bytes(self._stderr_buf[stderr_offset:])
stdout_text = stdout_raw.decode("utf-8", errors="replace").rstrip("\r\n")
stderr_text = stderr_raw.decode("utf-8", errors="replace")
stdout_str, stdout_truncated = _truncate_text(stdout_text, self._max_output_bytes)
stderr_str, stderr_truncated = _truncate_text(stderr_text, self._max_output_bytes)
# Trim the persistent buffers: everything we needed has been copied
# into stdout_raw/stderr_raw above, so discarding now keeps the
# session's memory bounded across many commands. The reader tasks
# only ever ``extend()`` these buffers (no offset bookkeeping
# outside this method), so resetting them here is safe.
del self._stdout_buf[:]
del self._stderr_buf[:]
return ShellResult(
stdout=stdout_str,
stderr=stderr_str,
exit_code=exit_code,
duration_ms=duration_ms,
truncated=stdout_truncated or stderr_truncated,
timed_out=timed_out,
)
# ------------------------------------------------------------------ helpers
def _build_script(self, command: str, sentinel: str) -> str:
if self._is_pwsh:
# Base64-encode the command and run it via Invoke-Expression to
# work around pwsh's whole-script parse requirement on stdin.
# $ErrorActionPreference is set to 'Stop' at session start so
# the catch block fires on cmdlet errors as well as parse
# failures surfaced by Invoke-Expression itself.
encoded = base64.b64encode(command.encode("utf-8")).decode("ascii")
return (
"& {"
" $__af_rc = 0;"
" try {"
f" $__af_cmd = [System.Text.Encoding]::UTF8.GetString([Convert]::FromBase64String('{encoded}'));"
" Invoke-Expression $__af_cmd;"
" if ($LASTEXITCODE -ne $null) { $__af_rc = $LASTEXITCODE }"
" elseif (-not $?) { $__af_rc = 1 }"
" } catch {"
" [Console]::Error.WriteLine($_.ToString());"
" $__af_rc = 1"
" } finally {"
f" [Console]::WriteLine('{sentinel}_' + $__af_rc);"
" [Console]::Out.Flush()"
" }"
" }\n"
)
# POSIX shell. Run the user command in a brace-group so its exit
# status is captured even if the user previously enabled ``set -e``
# — we save and restore the prior errexit state around the trailer
# so ``set -e`` (and other shell options) persist across commands
# exactly as the user configured them.
return (
f"__af_e=$-; set +e; {{ {command}\n}}; __af_rc=$?;"
f' case "$__af_e" in *e*) set -e;; esac;'
f" printf '\\n{sentinel}_%s\\n' \"$__af_rc\"\n"
)
async def _write_raw(self, text: str) -> None:
if self._proc is None or self._proc.stdin is None:
return
self._proc.stdin.write(text.encode("utf-8"))
await self._proc.stdin.drain()
async def _reader(
self,
stream: asyncio.StreamReader,
buf: bytearray,
*,
is_stdout: bool,
) -> None:
"""Persistent reader task: drains ``stream`` into ``buf`` until EOF."""
try:
while True:
chunk = await stream.read(_READ_CHUNK)
if not chunk:
if is_stdout:
self._stdout_closed = True
self._stdout_event.set()
return
buf.extend(chunk)
if is_stdout:
self._stdout_event.set()
except asyncio.CancelledError:
raise
except Exception:
if is_stdout:
self._stdout_closed = True
self._stdout_event.set()
async def _interrupt_current_command(self) -> None:
if self._proc is None or self._proc.returncode is not None:
return
try:
if sys.platform == "win32":
self._proc.send_signal(signal.CTRL_BREAK_EVENT) # type: ignore[attr-defined]
else:
os.killpg(os.getpgid(self._proc.pid), signal.SIGINT)
except (ProcessLookupError, PermissionError, OSError):
pass
def _parse_rc(after: bytes) -> int:
"""Parse ``_<digits>`` trailing the sentinel. Returns -1 on failure."""
if not after.startswith(b"_"):
return -1
digits = bytearray()
for b in after[1:]:
if b == ord("\n") or b == ord("\r"):
break
if 48 <= b <= 57 or b == ord("-"):
digits.append(b)
else:
break
if not digits:
return -1
try:
return int(digits.decode("ascii"))
except ValueError:
return -1
class _SentinelOverflow(RuntimeError):
"""Internal signal that the sentinel was never seen within the soft cap."""
@@ -0,0 +1,317 @@
# Copyright (c) Microsoft. All rights reserved.
"""High-level :class:`LocalShellTool` facade."""
from __future__ import annotations
import asyncio
import logging
import os
from collections.abc import Callable, Mapping, Sequence
from typing import Literal
from agent_framework import FunctionTool, tool
from agent_framework._tools import SHELL_TOOL_KIND_VALUE
from ._executor import run_stateless
from ._policy import ShellPolicy, ShellRequest
from ._resolve import is_powershell, resolve_shell
from ._session import ShellSession
from ._types import ShellCommandError, ShellMode, ShellResult
logger = logging.getLogger(__name__)
def _quote_posix(value: str) -> str:
r"""Return ``value`` wrapped in POSIX single quotes.
Single-quoted strings have no interpolation in POSIX shells. Embedded
single quotes are handled by closing the quote, inserting an escaped
quote, and reopening: ``a'b`` becomes ``'a'\''b'``.
"""
return "'" + value.replace("'", "'\\''") + "'"
def _quote_powershell(value: str) -> str:
"""Return ``value`` wrapped in PowerShell single quotes.
Single-quoted strings in PowerShell are literal — ``$`` and ``"`` carry
no special meaning. Embedded single quotes are doubled (``''``).
"""
return "'" + value.replace("'", "''") + "'"
_PERSISTENT_DESCRIPTION = (
"Execute a single shell command on the local machine and return its "
"stdout, stderr, and exit code. Commands run in a persistent session so "
"`cd` and environment variables from previous calls are preserved. "
"Approval is required by default."
)
_STATELESS_DESCRIPTION = (
"Execute a single shell command on the local machine and return its "
"stdout, stderr, and exit code. Each command runs in a fresh subprocess, "
"so `cd` and environment variables do not persist between calls. "
"Approval is required by default."
)
def _default_description(mode: ShellMode) -> str:
return _PERSISTENT_DESCRIPTION if mode == "persistent" else _STATELESS_DESCRIPTION
class LocalShellTool:
"""Cross-OS local shell tool that plugs into any agent-framework chat client.
Typical use::
shell = LocalShellTool()
agent = Agent(
client=client,
tools=[client.get_shell_tool(func=shell.as_function())],
)
Or as an async context manager (recommended in persistent mode so the
session is cleaned up on exit)::
async with LocalShellTool() as shell:
...
**Single-session ownership.** A persistent-mode :class:`LocalShellTool`
is owned by a single conversation / agent session — i.e. one user.
The backing shell process carries mutable state (cwd, exported
variables, shell history, background jobs) that every subsequent
command can observe, and a single stdin/stdout pipe serializes every
call. Do not share one instance across users, tenants, or concurrent
conversations: state leaks between them and commands queue behind
each other. Create one tool per session, close it (or use ``async
with``) when the session ends. If a shared instance is genuinely
required, construct with ``mode="stateless"`` so each call spawns a
fresh subprocess.
Args:
mode: ``"persistent"`` (default) keeps a single long-lived shell
subprocess so ``cd`` / ``export`` carry across calls.
``"stateless"`` spawns a fresh subprocess per call.
shell: Optional shell argv override. String values are tokenised.
When omitted, the platform default is used (``pwsh`` or
``powershell`` on Windows, ``bash`` or ``sh`` on Unix). May also
be overridden via the ``AGENT_FRAMEWORK_SHELL`` env var.
workdir: Working directory for commands. Defaults to the current
working directory. In persistent mode, each command is
re-anchored to this directory when ``confine_workdir=True`` —
see that argument for the exact semantics and caveats.
confine_workdir: When ``True`` (default), each command in persistent
mode is prefixed with a ``cd`` back into ``workdir`` so
``cd``-wandering in one call does not leak to the next. This is
a **re-anchor**, not a hard confinement — a command that does
``cd /tmp && rm -rf .`` in one call can still touch ``/tmp``.
Use :class:`ShellPolicy` or a sandboxed executor for true
confinement.
env: Seed environment. In stateless mode this replaces the child's
environment unless ``clean_env=False``. In persistent mode the
variables are exported before the session is used.
clean_env: When ``True``, do **not** inherit ``os.environ``; only
the variables supplied in ``env`` are visible to commands.
policy: Policy applied before approval. Defaults to an empty
:class:`ShellPolicy()` which allows every command; supply
explicit ``denylist``/``allowlist`` patterns to filter. The
policy is a UX pre-filter, not a security boundary — approval
gating + sandbox tier are the real defenses.
timeout: Per-command timeout in seconds. ``None`` disables. Default
30 s.
max_output_bytes: Combined stdout/stderr byte cap before truncation.
Default 64 KiB.
approval_mode: ``"always_require"`` (default) or ``"never_require"``.
Controls the ``FunctionTool.approval_mode`` on the returned
function, which the framework uses to gate execution via
``user_input_requests``. **Approval is the actual security
boundary of this tool** — disabling it requires
``acknowledge_unsafe=True``.
acknowledge_unsafe: Required to be ``True`` if you set
``approval_mode="never_require"``. ``ShellPolicy`` is a UX
pre-filter, not a security boundary; without approval the tool
will execute any command the model emits.
on_command: Optional audit hook called with the command string for
every command that passes policy. Use for logging / telemetry.
"""
def __init__(
self,
*,
mode: ShellMode = "persistent",
shell: str | Sequence[str] | None = None,
workdir: str | os.PathLike[str] | None = None,
confine_workdir: bool = True,
env: Mapping[str, str] | None = None,
clean_env: bool = False,
policy: ShellPolicy | None = None,
timeout: float | None = 30.0,
max_output_bytes: int = 64 * 1024,
approval_mode: Literal["always_require", "never_require"] = "always_require",
acknowledge_unsafe: bool = False,
on_command: Callable[[str], None] | None = None,
) -> None:
if mode not in ("persistent", "stateless"):
raise ValueError(f"mode must be 'persistent' or 'stateless', got {mode!r}")
if approval_mode == "never_require" and not acknowledge_unsafe:
raise ValueError(
"Setting approval_mode='never_require' disables the only built-in "
"security boundary of LocalShellTool. If you understand the risk "
"(arbitrary commands run on the host with the agent's privileges; "
"ShellPolicy is a UX pre-filter, not a defense), pass "
"acknowledge_unsafe=True. For untrusted input prefer a "
"sandboxed executor (e.g. DockerShellTool or HyperlightCodeActProvider)."
)
self._mode: ShellMode = mode
self._shell_override = shell
self._workdir: str | None = os.fspath(workdir) if workdir is not None else None
self._confine_workdir = confine_workdir
self._policy = policy or ShellPolicy()
self._timeout = timeout
self._max_output_bytes = max_output_bytes
self._approval_mode: Literal["always_require", "never_require"] = approval_mode
self._on_command = on_command
merged_env: dict[str, str] | None
if env is None and not clean_env:
merged_env = None # inherit
elif clean_env:
merged_env = dict(env) if env is not None else {}
else:
merged_env = {**os.environ, **dict(env or {})}
self._env = merged_env
self._interactive_argv = resolve_shell(self._shell_override, interactive=True)
self._stateless_argv = resolve_shell(self._shell_override, interactive=False)
self._session: ShellSession | None = None
self._session_lock: asyncio.Lock | None = None
def _get_session_lock(self) -> asyncio.Lock:
# Lazily create in the running loop so construction outside a loop is fine.
if self._session_lock is None:
self._session_lock = asyncio.Lock()
return self._session_lock
# ------------------------------------------------------------------ lifecycle
async def start(self) -> None:
"""Eagerly spawn the persistent session (no-op in stateless mode)."""
if self._mode != "persistent":
return
async with self._get_session_lock():
if self._session is None:
self._session = ShellSession(
self._interactive_argv,
workdir=self._workdir,
env=self._env,
max_output_bytes=self._max_output_bytes,
)
await self._session.start()
async def close(self) -> None:
"""Terminate the persistent session if any."""
async with self._get_session_lock():
if self._session is not None:
try:
await self._session.close()
finally:
self._session = None
async def __aenter__(self) -> LocalShellTool:
await self.start()
return self
async def __aexit__(self, *_exc: object) -> None:
await self.close()
# ------------------------------------------------------------------ core run
async def run(self, command: str, *, timeout: float | None = None) -> ShellResult:
"""Execute ``command`` directly and return its :class:`ShellResult`.
Applies policy and the audit hook, but **not** approval (that is
handled by the framework when this tool is wrapped via
:meth:`as_function`).
Args:
command: The shell command to execute.
timeout: Optional per-call timeout in seconds that overrides
the tool's configured default. When ``None``, the tool's
``timeout`` setting is used. The timeout is enforced
inside the executor (the subprocess is killed / the
persistent session tears down the command on timeout)
so callers do not need to wrap this call in
:func:`asyncio.wait_for`.
"""
request = ShellRequest(command=command, workdir=self._workdir)
decision = self._policy.evaluate(request)
if decision.decision == "deny":
raise ShellCommandError(f"Command rejected by policy: {decision.reason}")
if self._on_command is not None:
try:
self._on_command(command)
except Exception:
logger.exception("on_command hook raised")
effective_timeout = self._timeout if timeout is None else timeout
if self._mode == "persistent":
if self._session is None:
await self.start()
if self._session is None:
raise RuntimeError("LocalShellTool session failed to start")
effective = self._maybe_reanchor(command)
return await self._session.run(effective, timeout=effective_timeout)
return await run_stateless(
self._stateless_argv,
command,
workdir=self._workdir,
env=self._env,
timeout=effective_timeout,
max_output_bytes=self._max_output_bytes,
)
# ------------------------------------------------------------------ AF wiring
def as_function(
self,
*,
name: str = "run_shell",
description: str | None = None,
) -> FunctionTool:
"""Return an :class:`~agent_framework.FunctionTool` bound to this instance.
The returned tool has ``kind="shell"`` so provider-specific
``get_shell_tool(func=...)`` factories recognise it as a local shell.
"""
async def _run_shell(command: str) -> str:
try:
result = await self.run(command)
except ShellCommandError as exc:
return str(exc)
return result.format_for_model()
effective_description = description or _default_description(self._mode)
_run_shell.__doc__ = effective_description
return tool(
func=_run_shell,
name=name,
description=effective_description,
approval_mode=self._approval_mode,
kind=SHELL_TOOL_KIND_VALUE,
)
# ------------------------------------------------------------------ helpers
def _maybe_reanchor(self, command: str) -> str:
"""Prefix ``cd`` when confinement is enabled and workdir is set."""
if not self._confine_workdir or self._workdir is None:
return command
# Idempotent prefix: cd back before each command so a `cd` in one
# call does not leak workdir state to the next.
if self._interactive_argv and is_powershell(self._interactive_argv):
return f"Set-Location -LiteralPath {_quote_powershell(self._workdir)}\n{command}"
return f"cd -- {_quote_posix(self._workdir)}\n{command}"
@@ -0,0 +1,41 @@
# Copyright (c) Microsoft. All rights reserved.
"""Head/tail UTF-8 byte truncation shared by the local and Docker shell tools."""
from __future__ import annotations
def truncate_head_tail(data: bytes, cap: int) -> tuple[str, bool]:
"""Truncate ``data`` to ``cap`` bytes, keeping a head and a tail slice.
Distributes the budget so head receives ``cap // 2`` bytes and tail
receives ``cap - cap // 2`` bytes — for an odd ``cap`` the tail keeps
the extra byte so no input bytes are silently dropped on the boundary.
Returns the joined (head, marker, tail) string and a boolean indicating
whether truncation occurred.
Raises ``ValueError`` if ``cap`` is not positive — a non-positive
cap has no consistent meaning here and silently treating it as
"unlimited" would defeat output-capping assumptions in callers.
"""
if cap <= 0:
raise ValueError(f"cap must be positive; got {cap}")
if len(data) <= cap:
return data.decode("utf-8", errors="replace"), False
head_cap = cap // 2
tail_cap = cap - head_cap
head = data[:head_cap].decode("utf-8", errors="replace")
tail = data[len(data) - tail_cap :].decode("utf-8", errors="replace")
dropped = len(data) - cap
return f"{head}\n[... truncated {dropped} bytes ...]\n{tail}", True
def truncate_text_head_tail(text: str, cap: int) -> tuple[str, bool]:
"""``truncate_head_tail`` for already-decoded text.
Encodes ``text`` as UTF-8, applies the byte-budgeted head/tail split,
and returns a string. UTF-8 decode with ``errors="replace"`` ensures
truncation that lands mid-codepoint cannot raise.
"""
return truncate_head_tail(text.encode("utf-8"), cap)
@@ -0,0 +1,59 @@
# Copyright (c) Microsoft. All rights reserved.
"""Shared types for the local shell tool."""
from __future__ import annotations
from dataclasses import dataclass
from typing import Literal
ShellMode = Literal["persistent", "stateless"]
@dataclass(frozen=True)
class ShellResult:
"""The outcome of a single shell command invocation.
Attributes:
stdout: Captured standard output, possibly truncated.
stderr: Captured standard error, possibly truncated.
exit_code: The exit status reported by the shell or subprocess.
duration_ms: How long the command took, in milliseconds.
truncated: ``True`` when stdout or stderr was truncated to fit
``max_output_bytes``.
timed_out: ``True`` when the command was killed because it exceeded
the configured timeout.
"""
stdout: str
stderr: str
exit_code: int
duration_ms: int
truncated: bool = False
timed_out: bool = False
def format_for_model(self) -> str:
"""Format the result as a single text block suitable for an LLM."""
parts: list[str] = []
if self.stdout:
parts.append(self.stdout)
if self.stderr:
parts.append(f"stderr: {self.stderr}")
if self.truncated:
parts.append("[output truncated]")
if self.timed_out:
parts.append("[command timed out]")
parts.append(f"exit_code: {self.exit_code}")
return "\n".join(parts)
class ShellExecutionError(RuntimeError):
"""Base class for shell-tool execution failures."""
class ShellTimeoutError(ShellExecutionError):
"""Raised when a command exceeds the configured timeout."""
class ShellCommandError(ShellExecutionError):
"""Raised when a command is rejected by the configured policy."""