Python: Shell tool with support for local and Docker (#5664)

* feat(tools): add cross-OS LocalShellTool in new agent-framework-tools package Introduces a safe, cross-OS local shell tool as the first citizen of a new agent-framework-tools workspace package. Supports persistent (default) and stateless modes across pwsh/powershell.exe/bash/sh, with policy denylist, allowlist, approval gating, process-tree kill on timeout, output truncation, and audit hooks. Integrates with existing provider get_shell_tool(func=...) factories via FunctionTool kind='shell'. See docs/decisions/0026-builtin-tools-local-shell.md for the full design. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(tools): security hardening for LocalShellTool Codifies what LocalShellTool does and does not defend against, and delegates the security-relevant lifecycle primitive to a battle-tested library instead of hand-rolled per-OS code. Changes: - Adopt psutil for cross-OS process-tree termination (executor + session). Replaces hand-rolled taskkill/killpg with one canonical implementation. - Resolve taskkill.exe to absolute %SystemRoot%\System32 path so PATH poisoning cannot redirect us to an attacker-supplied binary. - Reframe ShellPolicy docstring + ADR + README: denylist is a guardrail, not a security boundary. - Require acknowledge_unsafe=True to set approval_mode='never_require', making the unsafe path explicitly opt-in with a self-documenting name. - Add tests/test_security.py codifying named CVE-style cases. Defenses we DO claim are asserted; non-defenses (denylist bypasses via backslash insertion, variable expansion, interpreter escape, base64, alternative tools, PowerShell-native verbs) are documented as expected-to-pass tests so residual risk stays visible. - Add Threat Model + Confidence Strategy sections to ADR 0026. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(tools): add DockerShellTool sandboxed shell tier Adds a container-backed shell executor as the recommended pattern for untrusted-input shell workflows. The container provides the security boundary (--network none, non-root user, --read-only, --cap-drop ALL, no-new-privileges, memory/pids limits, tmpfs /tmp), so approval gating is optional unlike LocalShellTool. Also introduces a ShellExecutor Protocol so callers can plug in custom backends (Firecracker, SSH, WASI) without forking the framework. Removes the planned HyperlightShellExecutor follow-up from ADR 0026: Hyperlight is a WASM code sandbox with no kernel/userland/shell binary, so a Hyperlight-backed shell is not viable. Docker is the realistic sandbox tier for shell. Tests: 11 unit tests for argv builders + lifecycle (no Docker daemon required); 3 integration tests gated on is_docker_available(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(tools): backport shell-tool fixes from .NET parity review Applies the applicable subset of bug fixes accumulated during the .NET shell-tool PR review (microsoft/agent-framework#5604) to the Python shell tool. A1 - Quote workdir safely in _maybe_reanchor Previously _tool.py used double-quote interpolation when emitting the cd/Set-Location prefix, which expanded $VAR, $(), and backticks in the workdir path. A workdir containing shell metacharacters could trigger arbitrary command execution before the user command ran. Replaced with single-quote escaping helpers _quote_posix and _quote_powershell that emit literal-string forms safe for both hosts. A5/A6 - Consolidate truncation to a single byte-aware helper Extracted a shared truncate_head_tail / truncate_text_head_tail helper in _truncate.py. The new implementation distributes odd caps so head receives floor(cap/2) and tail receives ceil(cap/2) bytes, matching the .NET round-9 fix and ensuring no input bytes are silently dropped on the boundary. _session.py previously truncated by Python str length while the caller passed _max_output_bytes - the unit mismatch is now gone: raw byte buffers go through truncate_head_tail and decoded text goes through truncate_text_head_tail. Unit tests added for the truncate and quote helpers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(tools): tone down narrative and overconfident comments in shell tool The shell tool's docstrings and comments contained two patterns that the .NET review pushed back on: - Narrative framing about implementation history ("hard-won", "we sidestep", "design inspiration: ...", competitor framework name-drops in module docstrings). - Overstated security guarantees ("battle-tested", "reasonable for untrusted input", "recommended executor for any agent that runs commands from untrusted input", "destructive commands are blocked", "safe local shell tool", "blocks shell injection"). Rewrites the affected docstrings and comments to describe what the code does in neutral terms. Behaviour is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(tools): add ShellEnvironmentProvider for the Python shell tool Ports the .NET ShellEnvironmentProvider as a Python ContextProvider so agents using LocalShellTool or DockerShellTool can be primed with an accurate description of the shell they're talking to (family, version, OS, working directory, and which CLIs are available). The provider runs probes through any ShellExecutor, caches the resulting snapshot, and on every before_run extends the session instructions with a markdown block describing the shell idiom to use. A failed first probe leaves the cache empty so the next call retries (no permanent poisoning). Probe failures from a narrow set of expected error types (ShellCommandError, ShellExecutionError, ShellTimeoutError, and asyncio.TimeoutError from the per-probe timeout) are recorded as None fields in the snapshot. Other exceptions propagate. Tool names are validated against ^[A-Za-z0-9._-]+$ before being interpolated into a probe command. Includes 12 unit tests covering happy path, stderr fallback, timeout handling, expected/unexpected exception paths, malicious tool name rejection, case-insensitive deduplication, retry after failure, concurrent first-callers sharing one probe, and the default and custom formatter paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(tools): document ShellEnvironmentProvider and finish comment cleanup Add a README section introducing ShellEnvironmentProvider, soften two remaining overconfident security-boundary comments in _executor_base.py and the DockerShellTool class docstring, and add a sample (shell_with_environment_provider.py) that demonstrates the provider in stateless and persistent modes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor(tools): move shell samples to python/samples/02-agents/tools The repository convention is to host samples under python/samples/ rather than inside the package directory. Move the two net-new shell samples (allow-list and environment-provider) to python/samples/02-agents/tools/ and drop the in-package samples/ directory; the existing top-level providers/openai/client_with_local_shell.py already covers the basic LocalShellTool walkthrough. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(tools): cover confine_workdir default and ShellResult.format_for_model Two new tests in test_local_shell_tool.py exercise the default confine_workdir=True behaviour on POSIX and PowerShell, asserting that 'cd' inside one persistent-mode call does not leak into the next. A new test_shell_result.py module provides direct unit coverage for every conditional branch of ShellResult.format_for_model (stdout, truncated, stderr, timed_out, exit_code) so regressions in the LLM-facing format are caught immediately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(tools): address PR #5664 review feedback - _tool.py: detect PowerShell via is_powershell() helper instead of basename string match - _environment.py: use public ContextProvider import (no private _ prefix) - _session.py: trim _stdout_buf/_stderr_buf after copying to avoid unbounded retention across calls - _docker.py: short-circuit start()/close() in stateless mode; add configurable shell kwarg (default bash, e.g. 'sh' for alpine) - tests: parenthesized multi-line assert; alpine integration tests now pass shell='sh' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(tools): satisfy CI quality gates - pyupgrade: drop quoted self-class refs in __aenter__/method annotations - ruff format: reflow long lines per workspace style - pyright: assert psutil non-None in optional-import branch; lowercase mutable module globals; annotate _approval_mode as Literal so tool() Literal-typed kwarg is accepted; add ... body to ShellExecutor.run protocol; remove unused deprecated _kill_tree wrapper - tests: skip docker integration tests on win32 (Windows containers don't support --read-only / alpine images) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Remove DEFAULT_DENYLIST; document single-session ownership; fix bandit findings Mirrors the .NET PR #5604 cleanup: - Remove DEFAULT_DENYLIST from ShellPolicy. ShellPolicy() now ships with an empty deny-list; operators opt into site-specific patterns explicitly. No major agent framework uses regex matching as a primary security control; AutoGen v2 removed theirs. Approval gating + sandbox tier remain the real boundaries. - Rewrite module / class docstrings to frame ShellPolicy as a UX pre-filter, not a security control. - Add Single-session ownership paragraphs to ShellExecutor, ShellSession, LocalShellTool, and DockerShellTool: a persistent-mode tool is owned by exactly one conversation / agent session; do not share across users or concurrent conversations. - Tests now supply explicit deny patterns instead of relying on a default. - Address Pre-commit Hooks (bandit) CI failures: convert internal-invariant asserts to explicit RuntimeError, annotate intentional subprocess/shell usage with # nosec, document container-internal /tmp paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR #5664 round-2 review feedback Deny-list documentation drift: - README and the OpenAI/local-shell sample no longer claim a built-in deny-list of destructive commands. ShellPolicy is described as an optional, operator-supplied UX pre-filter; the real boundaries remain approval gating and the sandbox tier. Behavioural fixes called out in review: - ShellPolicy.evaluate() now denies empty / whitespace-only commands explicitly instead of returning allow with no rationale. - truncate_head_tail() raises ValueError for cap <= 0 instead of silently returning the full input with truncated=False, which previously could defeat output-capping in callers that mis-configured the budget. - LocalShellTool.as_function() / DockerShellTool.as_function() return the ShellCommandError text directly so the model sees a single, non-redundant 'Command rejected by policy: …' message instead of the prior duplicated 'Command blocked by policy: Command rejected …' wrapping. - ShellSession POSIX sentinel trailer now snapshots and restores the prior errexit (set -e) state around the trailer, so a user 'set -e' in the persistent shell is no longer permanently disabled by the next run(). Tests: - New test_shell_parse_rc.py covers the full _parse_rc() edge-case surface (zero, positive, negative, CRLF, no newline, missing prefix, empty input, non-digits, trailing garbage, partial digits). - test_policy.py asserts the new empty-command deny. - test_shell_truncate_and_quote.py asserts ValueError for cap=0 and cap<0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review feedback for shell tool - _resolve.py: reject empty/whitespace shell override string - _tool.py / _docker.py: mode-aware default tool description (persistent vs stateless) - _tool.py: fix misleading workdir docstring (re-anchor, not blocking) - _types.py: emit stream-agnostic [output truncated] marker - _policy.py: declare _denies/_allows as dataclass fields - _environment.py: use $(pwd) instead of $PWD in POSIX probe Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review feedback: shell override flag + probe timeout safety - _resolve.py: in stateless mode, ensure shell overrides end with -c/-Command so commands aren't misinterpreted as script-file paths. - ShellExecutor.run / LocalShellTool.run / DockerShellTool.run now accept an optional imeout kwarg; ShellEnvironmentProvider drops the outer asyncio.wait_for and lets the executor enforce the probe timeout internally, so cancellation no longer risks leaving a hung subprocess or corrupted session. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review feedback: docker isolation + lifecycle robustness - pyproject.toml: bump agent-framework-core minimum from 1.2.0 to 1.2.2 to align with the rest of the workspace. - _docker.py: validate extra_run_args at construction time and reject flags that would dismantle the isolation defaults (--privileged, --cap-add, --security-opt, --network/--net, -v/--volume/--mount, --device, --pid, --ipc, --userns, --user, --read-only, --tmpfs, --add-host, --gpus, --cgroupns, --device-cgroup-rule); also documented the warning on the docstring. - _docker._stop_container: retry docker rm -f once and log a warning/error when it does not succeed, so operators can audit leaked containers instead of getting a silent success. - _docker._run_stateless timeout path: fall back to docker rm -f when docker kill fails or times out (--rm only reaps on clean exit), and log instead of silently swallowing communicate() errors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: alliscode <bentho@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: alliscode <25218250+alliscode@users.noreply.github.com>
2026-06-16 21:04:09 +08:00 · 2026-05-21 17:29:59 -07:00
parent afd2739e38
commit 8e54f0b0e7
32 changed files with 4269 additions and 63 deletions
@@ -0,0 +1,14 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Built-in tools for the Microsoft Agent Framework."""
+
+from __future__ import annotations
+
+import importlib.metadata
+
+try:
+    __version__ = importlib.metadata.version(__name__)
+except importlib.metadata.PackageNotFoundError:
+    __version__ = "0.0.0"
+
+__all__ = ["__version__"]
@@ -0,0 +1,53 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Cross-platform local shell tool for the Microsoft Agent Framework."""
+
+from __future__ import annotations
+
+from ._docker import (
+    DEFAULT_IMAGE as DOCKER_DEFAULT_IMAGE,
+)
+from ._docker import (
+    DockerNotAvailableError,
+    DockerShellTool,
+    is_docker_available,
+)
+from ._environment import (
+    ShellEnvironmentProvider,
+    ShellEnvironmentProviderOptions,
+    ShellEnvironmentSnapshot,
+    ShellFamily,
+    default_instructions_formatter,
+)
+from ._executor_base import ShellExecutor
+from ._policy import ShellDecision, ShellPolicy, ShellRequest
+from ._tool import LocalShellTool
+from ._types import (
+    ShellCommandError,
+    ShellExecutionError,
+    ShellMode,
+    ShellResult,
+    ShellTimeoutError,
+)
+
+__all__ = [
+    "DOCKER_DEFAULT_IMAGE",
+    "DockerNotAvailableError",
+    "DockerShellTool",
+    "LocalShellTool",
+    "ShellCommandError",
+    "ShellDecision",
+    "ShellEnvironmentProvider",
+    "ShellEnvironmentProviderOptions",
+    "ShellEnvironmentSnapshot",
+    "ShellExecutionError",
+    "ShellExecutor",
+    "ShellFamily",
+    "ShellMode",
+    "ShellPolicy",
+    "ShellRequest",
+    "ShellResult",
+    "ShellTimeoutError",
+    "default_instructions_formatter",
+    "is_docker_available",
+]
@@ -0,0 +1,700 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Sandboxed shell tool backed by a Docker (or compatible) container runtime.
+
+``DockerShellTool`` exposes the same public surface as
+:class:`LocalShellTool` but executes commands inside a container. The
+container is intended to be the security boundary; effective isolation
+depends on the host runtime configuration, image contents, and the flags
+passed at launch.
+
+Default flags applied at launch:
+
+- ``--network none``: no host or external network access.
+- ``--user 65534:65534``: runs as ``nobody:nogroup``.
+- ``--read-only`` root filesystem; only the optional ``host_workdir``
+  mount is writable when ``mount_readonly=False``.
+- ``--memory``, ``--pids-limit`` set to bounded values.
+- ``--cap-drop=ALL`` and ``--security-opt=no-new-privileges``.
+- ``--tmpfs /tmp`` so commands that need scratch space have somewhere to
+  write that doesn't escape the container.
+
+Persistent mode reuses :class:`ShellSession` by launching
+``docker exec -i <container> bash`` as the long-lived shell — the
+sentinel protocol works unchanged because the session is still talking
+to a bash REPL over pipes.
+
+**Single-session ownership.** In persistent mode a
+:class:`DockerShellTool` owns a long-lived container plus the bash REPL
+inside it. The container's filesystem, environment, working directory,
+and any artifacts the agent has produced are visible to every subsequent
+command, and a single stdin/stdout pipe serializes every call. A
+persistent-mode tool is therefore intended to be owned by exactly one
+conversation / agent session — i.e. one user. Do not share one instance
+across users, tenants, or concurrent conversations: their state leaks
+together inside the container and commands queue behind each other.
+Create one tool per session and close it (or use ``async with``) when
+the session ends; closing stops and removes the container. If a shared
+instance is genuinely required, use ``mode="stateless"`` so each call
+gets its own throwaway ``docker run --rm``.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import contextlib
+import logging
+import os
+import secrets
+import shutil
+import subprocess  # noqa: S404  # nosec B404 - running shell commands is the whole point of this tool
+import time
+from collections.abc import Callable, Mapping, Sequence
+from typing import Literal
+
+from agent_framework import FunctionTool, tool
+from agent_framework._tools import SHELL_TOOL_KIND_VALUE
+
+from ._policy import ShellPolicy, ShellRequest
+from ._session import ShellSession
+from ._truncate import truncate_head_tail as _truncate_bytes
+from ._types import ShellCommandError, ShellMode, ShellResult
+
+logger = logging.getLogger(__name__)
+
+
+DEFAULT_IMAGE = "mcr.microsoft.com/azurelinux/base/core:3.0"
+DEFAULT_CONTAINER_USER = "65534:65534"  # nobody:nogroup on most distros
+DEFAULT_NETWORK = "none"
+DEFAULT_MEMORY = "512m"
+DEFAULT_PIDS_LIMIT = 256
+DEFAULT_WORKDIR = "/workspace"
+
+# Docker run flags that would silently dismantle the isolation defaults
+# (--cap-drop ALL, --security-opt no-new-privileges, --network none,
+# --read-only root, pids/memory caps) if spliced in via ``extra_run_args``.
+# These are rejected at construction time so the failure surfaces loudly
+# rather than as a silently-unsandboxed container at runtime.
+_BLOCKED_EXTRA_RUN_FLAGS: tuple[str, ...] = (
+    "--privileged",
+    "--cap-add",
+    "--security-opt",
+    "--network",
+    "--net",
+    "--volume",
+    "-v",
+    "--mount",
+    "--device",
+    "--device-cgroup-rule",
+    "--pid",
+    "--ipc",
+    "--userns",
+    "--user",
+    "--cgroupns",
+    "--add-host",
+    "--gpus",
+    "--read-only",
+    "--tmpfs",
+)
+
+
+def _validate_extra_run_args(args: Sequence[str]) -> None:
+    """Reject extra ``docker run`` args that would break the isolation contract.
+
+    A caller can otherwise pass ``["--privileged"]``, ``["--network=host"]``,
+    ``["-v", "/:/host:rw"]``, etc. and silently undo every isolation flag
+    this tool sets. The blocklist covers the obvious offenders; operators
+    that genuinely need one of these flags should subclass the tool or
+    build their own argv rather than slip past the check.
+    """
+    bad: list[str] = []
+    for raw in args:
+        if not raw.startswith("-"):
+            continue
+        # Split off any "=value" tail so "--network=host" matches "--network".
+        flag = raw.split("=", 1)[0]
+        if flag in _BLOCKED_EXTRA_RUN_FLAGS:
+            bad.append(raw)
+    if bad:
+        raise ValueError(
+            "extra_run_args contains flags that would dismantle DockerShellTool's "
+            f"isolation defaults: {bad}. Override these via dedicated constructor "
+            "arguments (network, host_workdir, mount_readonly, read_only_root, "
+            "memory, pids_limit, user) or subclass the tool if you really need "
+            "to relax the sandbox."
+        )
+
+
+class DockerNotAvailableError(RuntimeError):
+    """Raised when the configured docker binary cannot be reached."""
+
+
+def is_docker_available(binary: str = "docker") -> bool:
+    """Return ``True`` if ``binary`` is on PATH and the daemon responds."""
+    if shutil.which(binary) is None:
+        return False
+    try:
+        out = subprocess.run(  # noqa: S603  # nosec B603 - argv is built from trusted binary name
+            [binary, "version", "--format", "{{.Server.Version}}"],
+            capture_output=True,
+            timeout=5.0,
+            check=False,
+        )
+    except (OSError, subprocess.TimeoutExpired):
+        return False
+    return out.returncode == 0 and bool(out.stdout.strip())
+
+
+# ---------------------------------------------------------------------------
+# Pure argv builders. Kept side-effect-free so unit tests don't need Docker.
+# ---------------------------------------------------------------------------
+
+
+def build_run_argv(
+    *,
+    binary: str,
+    image: str,
+    container_name: str,
+    user: str,
+    network: str,
+    memory: str,
+    pids_limit: int,
+    workdir: str,
+    host_workdir: str | None,
+    mount_readonly: bool,
+    read_only_root: bool,
+    extra_env: Mapping[str, str] | None,
+    extra_args: Sequence[str] | None,
+) -> list[str]:
+    """Build the ``docker run -d`` argv that starts the long-lived container.
+
+    The container runs ``sleep infinity`` so it stays alive while the
+    session uses ``docker exec`` for individual commands.
+    """
+    argv: list[str] = [
+        binary,
+        "run",
+        "-d",
+        "--rm",
+        "--name",
+        container_name,
+        "--user",
+        user,
+        "--network",
+        network,
+        "--memory",
+        memory,
+        "--pids-limit",
+        str(pids_limit),
+        "--cap-drop",
+        "ALL",
+        "--security-opt",
+        "no-new-privileges",
+        "--tmpfs",
+        "/tmp:rw,nosuid,nodev,size=64m",  # noqa: S108,  # nosec B108 - tmpfs inside the container, not on the host
+        "--workdir",
+        workdir,
+    ]
+    if read_only_root:
+        argv.append("--read-only")
+    if host_workdir is not None:
+        ro = "ro" if mount_readonly else "rw"
+        argv.extend(["-v", f"{host_workdir}:{workdir}:{ro}"])
+    if extra_env:
+        for k, v in extra_env.items():
+            argv.extend(["-e", f"{k}={v}"])
+    if extra_args:
+        argv.extend(extra_args)
+    argv.extend([image, "sleep", "infinity"])
+    return argv
+
+
+def build_exec_argv(
+    *,
+    binary: str,
+    container_name: str,
+    interactive: bool,
+    shell: str = "bash",
+) -> list[str]:
+    """Build the ``docker exec -i <container> <shell>`` argv.
+
+    For persistent mode this is the long-lived shell that
+    :class:`ShellSession` reads/writes via stdin/stdout pipes.
+    """
+    argv = [binary, "exec", "-i", container_name, shell]
+    if not interactive:
+        # Stateless: ``docker exec`` is run per-command; <shell> -c <cmd> is
+        # appended later by run_stateless.
+        argv.extend(["-c"])  # caller appends the command
+    elif shell == "bash":
+        argv.extend(["--noprofile", "--norc"])
+    return argv
+
+
+# ---------------------------------------------------------------------------
+# DockerShellTool
+# ---------------------------------------------------------------------------
+
+
+_PERSISTENT_DESCRIPTION = (
+    "Execute a single shell command inside an isolated Docker container "
+    "and return its stdout, stderr, and exit code. Commands run in a "
+    "persistent session so `cd` and environment variables from previous "
+    "calls are preserved within the container."
+)
+
+_STATELESS_DESCRIPTION = (
+    "Execute a single shell command inside an isolated Docker container "
+    "and return its stdout, stderr, and exit code. Each command runs in a "
+    "fresh container, so `cd` and environment variables do not persist "
+    "between calls."
+)
+
+
+def _default_description(mode: ShellMode) -> str:
+    return _PERSISTENT_DESCRIPTION if mode == "persistent" else _STATELESS_DESCRIPTION
+
+
+class DockerShellTool:
+    """Shell tool that runs commands inside a Docker (or compatible) container.
+
+    **Single-session ownership.** In persistent mode this tool owns a
+    long-lived container plus the bash REPL inside it; it is intended to
+    be owned by a single conversation / agent session — i.e. one user.
+    Do not share one instance across users, tenants, or concurrent
+    conversations: state leaks together inside the container and commands
+    queue behind each other. Create one tool per session and close it
+    (or use ``async with``) when the session ends. If a shared instance
+    is genuinely required, use ``mode="stateless"``. See the module
+    docstring for more.
+
+    Args:
+        image: OCI image to run. Defaults to a small Microsoft-maintained
+            base image. Override with anything that includes ``bash`` and
+            (for persistent mode) ``sleep``.
+        container_name: Optional explicit name. When ``None`` a unique
+            name is generated per instance.
+        mode: ``"persistent"`` (default) keeps a single long-lived
+            container with `cd`/`export` carrying across calls.
+            ``"stateless"`` runs each command in a fresh ``docker run --rm``.
+        host_workdir: Optional host directory to mount into the container
+            at ``workdir``. Mounted read-only by default; pass
+            ``mount_readonly=False`` to allow writes.
+        workdir: Path inside the container. Default ``/workspace``.
+        mount_readonly: When ``True`` (default), mount ``host_workdir`` ro.
+        network: Docker network mode. Default ``"none"`` for no network.
+        memory: Container memory limit (e.g. ``"512m"``, ``"2g"``).
+        pids_limit: Max processes inside the container.
+        user: ``UID:GID`` to run as. Default ``65534:65534`` (nobody).
+        read_only_root: Mount the root filesystem read-only. Default ``True``.
+        extra_run_args: Additional args appended to ``docker run``.
+
+            .. warning::
+               This parameter can dismantle the tool's isolation
+               contract. Flags that would undo the default sandbox
+               (``--privileged``, ``--cap-add``, ``--security-opt``,
+               ``--network``/``--net``, ``-v``/``--volume``,
+               ``--mount``, ``--device``, ``--pid``, ``--ipc``,
+               ``--userns``, ``--user``, ``--read-only``,
+               ``--tmpfs``, ``--add-host``, ``--gpus``, ``--cgroupns``,
+               ``--device-cgroup-rule``) are rejected at construction
+               time. Override the corresponding dedicated argument
+               (``network``, ``host_workdir``, ``mount_readonly``,
+               ``read_only_root``, ``user``, etc.) instead. If you
+               genuinely need to relax the sandbox further, subclass
+               the tool — don't slip past this check.
+        env: Environment variables to set inside the container. These are
+            passed via ``-e`` and apply to every command.
+        policy: Optional :class:`ShellPolicy`. Less critical than for
+            ``LocalShellTool`` since the container is the intended
+            isolation layer, but useful as a UX pre-filter (and for audit
+            logging). Defaults to an empty policy; supply patterns
+            explicitly to enable filtering.
+        timeout: Per-command timeout in seconds.
+        max_output_bytes: Combined stdout/stderr byte cap before truncation.
+        approval_mode: Controls the FunctionTool approval gate. Unlike
+            ``LocalShellTool``, ``"never_require"`` is permitted without
+            ``acknowledge_unsafe`` because the container — when launched
+            with the default isolation flags and a trusted runtime — is
+            the intended boundary rather than approval.
+        on_command: Audit hook fired for every allowed command.
+        docker_binary: Override (e.g. ``"podman"``).
+        shell: Shell binary to invoke inside the container. Defaults to
+            ``"bash"``; pass ``"sh"`` for minimal images such as Alpine
+            that don't ship bash. Anything else must be present on
+            ``$PATH`` inside the image.
+    """
+
+    def __init__(
+        self,
+        *,
+        image: str = DEFAULT_IMAGE,
+        container_name: str | None = None,
+        mode: ShellMode = "persistent",
+        host_workdir: str | os.PathLike[str] | None = None,
+        workdir: str = DEFAULT_WORKDIR,
+        mount_readonly: bool = True,
+        network: str = DEFAULT_NETWORK,
+        memory: str = DEFAULT_MEMORY,
+        pids_limit: int = DEFAULT_PIDS_LIMIT,
+        user: str = DEFAULT_CONTAINER_USER,
+        read_only_root: bool = True,
+        extra_run_args: Sequence[str] | None = None,
+        env: Mapping[str, str] | None = None,
+        policy: ShellPolicy | None = None,
+        timeout: float | None = 30.0,
+        max_output_bytes: int = 64 * 1024,
+        approval_mode: Literal["always_require", "never_require"] = "always_require",
+        on_command: Callable[[str], None] | None = None,
+        docker_binary: str = "docker",
+        shell: str = "bash",
+    ) -> None:
+        if mode not in ("persistent", "stateless"):
+            raise ValueError(f"mode must be 'persistent' or 'stateless', got {mode!r}")
+        _validate_extra_run_args(tuple(extra_run_args or ()))
+        self._image = image
+        self._container_name = container_name or f"af-shell-{secrets.token_hex(6)}"
+        self._mode: ShellMode = mode
+        self._host_workdir: str | None = os.fspath(host_workdir) if host_workdir is not None else None
+        self._workdir = workdir
+        self._mount_readonly = mount_readonly
+        self._network = network
+        self._memory = memory
+        self._pids_limit = pids_limit
+        self._user = user
+        self._read_only_root = read_only_root
+        self._extra_run_args = tuple(extra_run_args or ())
+        self._env = dict(env or {})
+        self._policy = policy or ShellPolicy()
+        self._timeout = timeout
+        self._max_output_bytes = max_output_bytes
+        self._approval_mode: Literal["always_require", "never_require"] = approval_mode
+        self._on_command = on_command
+        self._binary = docker_binary
+        self._shell = shell
+
+        self._session: ShellSession | None = None
+        self._container_started = False
+        self._lifecycle_lock: asyncio.Lock | None = None
+
+    def _get_lifecycle_lock(self) -> asyncio.Lock:
+        if self._lifecycle_lock is None:
+            self._lifecycle_lock = asyncio.Lock()
+        return self._lifecycle_lock
+
+    # ------------------------------------------------------------------ lifecycle
+
+    async def start(self) -> None:
+        """Pull/start the container and (if persistent) the inner shell session."""
+        # Stateless mode never uses the long-lived container — every call goes
+        # through ``docker run --rm`` — so start()/close() are no-ops.
+        if self._mode == "stateless":
+            return
+        async with self._get_lifecycle_lock():
+            if self._container_started:
+                if self._session is not None:
+                    await self._session.start()
+                return
+            await self._start_container()
+            self._container_started = True
+            argv = build_exec_argv(
+                binary=self._binary,
+                container_name=self._container_name,
+                interactive=True,
+                shell=self._shell,  # nosec B604 - 'shell' is the binary name kwarg, not subprocess shell=True
+            )
+            self._session = ShellSession(
+                argv,
+                workdir=None,  # workdir is set on the container itself
+                env=None,
+                max_output_bytes=self._max_output_bytes,
+            )
+            await self._session.start()
+
+    async def close(self) -> None:
+        """Stop the inner shell session and tear down the container."""
+        if self._mode == "stateless":
+            return
+        async with self._get_lifecycle_lock():
+            if self._session is not None:
+                try:
+                    await self._session.close()
+                finally:
+                    self._session = None
+            if self._container_started:
+                await self._stop_container()
+                self._container_started = False
+
+    async def __aenter__(self) -> DockerShellTool:
+        await self.start()
+        return self
+
+    async def __aexit__(self, *_exc: object) -> None:
+        await self.close()
+
+    # ------------------------------------------------------------------ execution
+
+    async def run(self, command: str, *, timeout: float | None = None) -> ShellResult:
+        """Execute ``command`` inside the container and return its result.
+
+        Args:
+            command: Shell command to execute.
+            timeout: Optional per-call timeout in seconds overriding the
+                tool's configured default. Enforced inside the executor
+                (kills the container / interrupts the bash REPL) so the
+                caller does not need to wrap the call in
+                :func:`asyncio.wait_for`.
+        """
+        request = ShellRequest(command=command, workdir=self._workdir)
+        decision = self._policy.evaluate(request)
+        if decision.decision == "deny":
+            raise ShellCommandError(f"Command rejected by policy: {decision.reason}")
+        if self._on_command is not None:
+            try:
+                self._on_command(command)
+            except Exception:
+                logger.exception("on_command hook raised")
+
+        effective_timeout = self._timeout if timeout is None else timeout
+
+        if self._mode == "persistent":
+            if self._session is None:
+                await self.start()
+            if self._session is None:
+                raise RuntimeError("DockerShellTool session failed to start")
+            return await self._session.run(command, timeout=effective_timeout)
+
+        return await self._run_stateless(command, timeout=effective_timeout)
+
+    # ------------------------------------------------------------------ stateless
+
+    async def _run_stateless(self, command: str, *, timeout: float | None) -> ShellResult:
+        """Run a single command in a fresh ``docker run --rm`` container."""
+        per_call_name = f"af-shell-{secrets.token_hex(6)}"
+        argv = [
+            self._binary,
+            "run",
+            "--rm",
+            "-i",
+            "--name",
+            per_call_name,
+            "--user",
+            self._user,
+            "--network",
+            self._network,
+            "--memory",
+            self._memory,
+            "--pids-limit",
+            str(self._pids_limit),
+            "--cap-drop",
+            "ALL",
+            "--security-opt",
+            "no-new-privileges",
+            "--tmpfs",
+            "/tmp:rw,nosuid,nodev,size=64m",  # noqa: S108,  # nosec B108 - tmpfs inside the container, not on the host
+            "--workdir",
+            self._workdir,
+        ]
+        if self._read_only_root:
+            argv.append("--read-only")
+        if self._host_workdir is not None:
+            ro = "ro" if self._mount_readonly else "rw"
+            argv.extend(["-v", f"{self._host_workdir}:{self._workdir}:{ro}"])
+        for k, v in self._env.items():
+            argv.extend(["-e", f"{k}={v}"])
+        argv.extend(self._extra_run_args)
+        argv.extend([self._image, self._shell, "-c", command])
+
+        started = time.monotonic()
+        proc = await asyncio.create_subprocess_exec(
+            *argv,
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.PIPE,
+        )
+        timed_out = False
+        try:
+            stdout_bytes, stderr_bytes = await asyncio.wait_for(proc.communicate(), timeout=timeout)
+        except asyncio.TimeoutError:
+            timed_out = True
+            # Kill the container by name. ``--rm`` reaps it on a clean
+            # exit; if ``docker kill`` itself hangs or returns non-zero
+            # (daemon stuck, container with unkillable kernel task) the
+            # ``--rm`` reaper never fires and the container is leaked,
+            # so we explicitly fall back to ``docker rm -f``.
+            killer = await asyncio.create_subprocess_exec(
+                self._binary,
+                "kill",
+                "--signal",
+                "KILL",
+                per_call_name,
+                stdout=asyncio.subprocess.DEVNULL,
+                stderr=asyncio.subprocess.DEVNULL,
+            )
+            kill_ok = False
+            try:
+                rc = await asyncio.wait_for(killer.wait(), timeout=5.0)
+                kill_ok = rc == 0
+            except asyncio.TimeoutError:
+                killer.kill()
+                with contextlib.suppress(Exception):
+                    await killer.wait()
+            if not kill_ok:
+                logger.warning(
+                    "docker kill of stateless container %s did not succeed; "
+                    "attempting docker rm -f to avoid a container leak",
+                    per_call_name,
+                )
+                reaper = await asyncio.create_subprocess_exec(
+                    self._binary,
+                    "rm",
+                    "-f",
+                    per_call_name,
+                    stdout=asyncio.subprocess.DEVNULL,
+                    stderr=asyncio.subprocess.PIPE,
+                )
+                try:
+                    _, rerr = await asyncio.wait_for(reaper.communicate(), timeout=5.0)
+                    if reaper.returncode != 0:
+                        logger.error(
+                            "docker rm -f %s failed (rc=%s, err=%s); container may be leaked",
+                            per_call_name,
+                            reaper.returncode,
+                            rerr.decode("utf-8", errors="replace").strip(),
+                        )
+                except asyncio.TimeoutError:
+                    reaper.kill()
+                    with contextlib.suppress(Exception):
+                        await reaper.wait()
+                    logger.error(
+                        "docker rm -f %s timed out; container may be leaked",
+                        per_call_name,
+                    )
+            try:
+                stdout_bytes, stderr_bytes = await proc.communicate()
+            except Exception:
+                # Pipe/socket failures after kill leave us with no captured
+                # output. Log so operators can correlate empty results with
+                # the timeout teardown (otherwise the model just sees
+                # exit_code=-1 with no stdout/stderr).
+                logger.warning(
+                    "failed to drain stdout/stderr after killing stateless container %s",
+                    per_call_name,
+                    exc_info=True,
+                )
+                stdout_bytes, stderr_bytes = b"", b""
+
+        duration_ms = int((time.monotonic() - started) * 1000)
+        stdout_str, stdout_truncated = _truncate_bytes(stdout_bytes or b"", self._max_output_bytes)
+        stderr_str, stderr_truncated = _truncate_bytes(stderr_bytes or b"", self._max_output_bytes)
+        return ShellResult(
+            stdout=stdout_str,
+            stderr=stderr_str,
+            exit_code=proc.returncode if proc.returncode is not None else -1,
+            duration_ms=duration_ms,
+            truncated=stdout_truncated or stderr_truncated,
+            timed_out=timed_out,
+        )
+
+    # ------------------------------------------------------------------ container ops
+
+    async def _start_container(self) -> None:
+        argv = build_run_argv(
+            binary=self._binary,
+            image=self._image,
+            container_name=self._container_name,
+            user=self._user,
+            network=self._network,
+            memory=self._memory,
+            pids_limit=self._pids_limit,
+            workdir=self._workdir,
+            host_workdir=self._host_workdir,
+            mount_readonly=self._mount_readonly,
+            read_only_root=self._read_only_root,
+            extra_env=self._env,
+            extra_args=self._extra_run_args,
+        )
+        proc = await asyncio.create_subprocess_exec(
+            *argv,
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.PIPE,
+        )
+        out, err = await proc.communicate()
+        if proc.returncode != 0:
+            raise DockerNotAvailableError(
+                f"Failed to start container ({proc.returncode}): {err.decode('utf-8', errors='replace').strip()}"
+            )
+        logger.info(
+            "started docker container %s (id=%s)",
+            self._container_name,
+            out.decode("utf-8", errors="replace").strip()[:12],
+        )
+
+    async def _stop_container(self) -> None:
+        # Use docker rm -f for a hard shutdown. With --rm on the run
+        # command, this also reaps the container.
+        async def _attempt(timeout: float) -> tuple[int | None, str]:
+            """Run a single ``docker rm -f`` attempt. Returns (rc, stderr)."""
+            proc = await asyncio.create_subprocess_exec(
+                self._binary,
+                "rm",
+                "-f",
+                self._container_name,
+                stdout=asyncio.subprocess.DEVNULL,
+                stderr=asyncio.subprocess.PIPE,
+            )
+            try:
+                _, err = await asyncio.wait_for(proc.communicate(), timeout=timeout)
+                return proc.returncode, err.decode("utf-8", errors="replace").strip()
+            except asyncio.TimeoutError:
+                proc.kill()
+                with contextlib.suppress(Exception):
+                    await proc.wait()
+                return None, "docker rm -f timed out"
+
+        rc, err = await _attempt(timeout=10.0)
+        if rc == 0:
+            return
+        logger.warning(
+            "docker rm -f %s did not succeed on first attempt (rc=%s, err=%s); retrying",
+            self._container_name,
+            rc,
+            err,
+        )
+        rc, err = await _attempt(timeout=5.0)
+        if rc != 0:
+            # Container may have been leaked. Surface the name so an
+            # operator can clean it up manually.
+            logger.error(
+                "docker rm -f %s failed after retry (rc=%s, err=%s); "
+                "container may be running and require manual cleanup",
+                self._container_name,
+                rc,
+                err,
+            )
+
+    # ------------------------------------------------------------------ AF wiring
+
+    def as_function(
+        self,
+        *,
+        name: str = "run_shell",
+        description: str | None = None,
+    ) -> FunctionTool:
+        """Return a :class:`~agent_framework.FunctionTool` bound to this instance."""
+
+        async def _run_shell(command: str) -> str:
+            try:
+                result = await self.run(command)
+            except ShellCommandError as exc:
+                return str(exc)
+            return result.format_for_model()
+
+        effective_description = description or _default_description(self._mode)
+        _run_shell.__doc__ = effective_description
+        return tool(
+            func=_run_shell,
+            name=name,
+            description=effective_description,
+            approval_mode=self._approval_mode,
+            kind=SHELL_TOOL_KIND_VALUE,
+        )
@@ -0,0 +1,281 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Shell environment context provider.
+
+Probes the underlying shell (OS, family/version, working directory,
+configured CLI tools) once per provider lifetime and injects an
+instructions block so the agent emits commands in the correct shell
+idiom rather than defaulting to bash syntax inside a PowerShell session
+or vice versa. The probe runs through any :class:`ShellExecutor`, so the
+same provider works with both :class:`LocalShellTool` and
+:class:`DockerShellTool`.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import platform
+import re
+import sys
+from collections.abc import Callable, Mapping, Sequence
+from dataclasses import dataclass, field
+from enum import Enum
+from typing import Any, ClassVar
+
+from agent_framework import AgentSession, ContextProvider, SessionContext, SupportsAgentRun
+
+from ._executor_base import ShellExecutor
+from ._types import ShellCommandError, ShellExecutionError, ShellResult, ShellTimeoutError
+
+
+class ShellFamily(str, Enum):
+    """Shell families recognised by the provider."""
+
+    POSIX = "posix"
+    POWERSHELL = "powershell"
+
+
+@dataclass(frozen=True)
+class ShellEnvironmentSnapshot:
+    """Point-in-time snapshot of the shell environment.
+
+    Attributes:
+        family: Detected (or configured) shell family.
+        os_description: A short OS description from :mod:`platform`.
+        shell_version: Reported shell version, or ``None`` when probing
+            failed or the shell did not report one.
+        working_directory: CWD reported by the shell, or empty string
+            when probing failed.
+        tool_versions: Map of probed CLI tool name to reported version.
+            ``None`` values indicate the tool was not installed or did
+            not respond to ``--version`` within the probe timeout.
+    """
+
+    family: ShellFamily
+    os_description: str
+    shell_version: str | None
+    working_directory: str
+    tool_versions: Mapping[str, str | None]
+
+
+@dataclass(frozen=True)
+class ShellEnvironmentProviderOptions:
+    """Configuration for :class:`ShellEnvironmentProvider`.
+
+    Attributes:
+        probe_tools: CLI tools whose ``--version`` output is probed.
+        override_family: Optional override for the auto-detected family.
+            When ``None``, the family is inferred from :data:`sys.platform`
+            (Windows → PowerShell, otherwise POSIX). Set this when
+            running against a non-default shell (e.g. bash on Windows
+            via WSL, or pwsh on Linux).
+        probe_timeout: Per-probe execution timeout in seconds. Probes
+            that exceed this are recorded as missing rather than raised
+            to the agent.
+        instructions_formatter: Optional callable that renders the
+            snapshot as the instructions block. When ``None``, the
+            built-in :func:`default_instructions_formatter` is used.
+    """
+
+    probe_tools: Sequence[str] = field(
+        default_factory=lambda: ("git", "node", "python", "docker"),
+    )
+    override_family: ShellFamily | None = None
+    probe_timeout: float = 5.0
+    instructions_formatter: Callable[[ShellEnvironmentSnapshot], str] | None = None
+
+
+_TOOL_NAME_PATTERN = re.compile(r"^[A-Za-z0-9._-]+$")
+
+
+def _detect_family() -> ShellFamily:
+    return ShellFamily.POWERSHELL if sys.platform == "win32" else ShellFamily.POSIX
+
+
+def _first_non_empty_line(text: str) -> str | None:
+    for line in text.splitlines():
+        stripped = line.strip()
+        if stripped:
+            return stripped
+    return None
+
+
+class ShellEnvironmentProvider(ContextProvider):
+    """:class:`ContextProvider` that injects a shell-environment block.
+
+    The provider runs a small set of probe commands against the supplied
+    :class:`ShellExecutor` once, caches the resulting
+    :class:`ShellEnvironmentSnapshot`, and on every ``before_run`` adds a
+    formatted instructions block to the session context. It does not
+    register any tools.
+
+    Probe failures from a narrow set of expected error types are recorded
+    as ``None`` fields in the snapshot (per-probe timeout, policy
+    rejection, executor spawn failure). Other exceptions propagate so
+    bugs are not silently swallowed.
+
+    A missing CLI never fails the agent: the model simply sees fewer
+    hints in its system prompt.
+    """
+
+    DEFAULT_SOURCE_ID: ClassVar[str] = "shell_environment"
+
+    def __init__(
+        self,
+        executor: ShellExecutor,
+        options: ShellEnvironmentProviderOptions | None = None,
+        *,
+        source_id: str | None = None,
+    ) -> None:
+        super().__init__(source_id or self.DEFAULT_SOURCE_ID)
+        self._executor = executor
+        self._options = options or ShellEnvironmentProviderOptions()
+        self._snapshot: ShellEnvironmentSnapshot | None = None
+        self._lock = asyncio.Lock()
+
+    @property
+    def current_snapshot(self) -> ShellEnvironmentSnapshot | None:
+        """The most recent snapshot, or ``None`` before the first probe."""
+        return self._snapshot
+
+    async def refresh(self) -> ShellEnvironmentSnapshot:
+        """Force a re-probe and replace the cached snapshot.
+
+        Useful when the agent has changed something the snapshot depends
+        on, e.g. installed a new CLI mid-session.
+        """
+        async with self._lock:
+            snapshot = await self._probe()
+            self._snapshot = snapshot
+            return snapshot
+
+    async def before_run(
+        self,
+        *,
+        agent: SupportsAgentRun,
+        session: AgentSession,
+        context: SessionContext,
+        state: dict[str, Any],
+    ) -> None:
+        snapshot = await self._get_or_probe()
+        formatter = self._options.instructions_formatter or default_instructions_formatter
+        context.extend_instructions(self.source_id, formatter(snapshot))
+
+    async def _get_or_probe(self) -> ShellEnvironmentSnapshot:
+        # Double-checked: return any already-cached snapshot without
+        # acquiring the lock; otherwise serialize the first probe so
+        # concurrent first-callers wait for a single result. A failed
+        # probe leaves _snapshot as None so the next call retries.
+        if self._snapshot is not None:
+            return self._snapshot
+        async with self._lock:
+            if self._snapshot is None:
+                self._snapshot = await self._probe()
+            return self._snapshot
+
+    async def _probe(self) -> ShellEnvironmentSnapshot:
+        family = self._options.override_family or _detect_family()
+        await self._executor.start()
+
+        shell_version, working_dir = await self._probe_shell_and_cwd(family)
+
+        tool_versions: dict[str, str | None] = {}
+        for tool in self._options.probe_tools:
+            # Skip case-insensitive duplicates so a caller passing
+            # ("git", "GIT") does not probe twice.
+            if tool.lower() in {existing.lower() for existing in tool_versions}:
+                continue
+            tool_versions[tool] = await self._probe_tool_version(tool)
+
+        return ShellEnvironmentSnapshot(
+            family=family,
+            os_description=platform.platform(),
+            shell_version=shell_version,
+            working_directory=working_dir,
+            tool_versions=tool_versions,
+        )
+
+    async def _probe_shell_and_cwd(self, family: ShellFamily) -> tuple[str | None, str]:
+        if family is ShellFamily.POWERSHELL:
+            command = (
+                'Write-Output ("VERSION=" + $PSVersionTable.PSVersion.ToString()); '
+                'Write-Output ("CWD=" + (Get-Location).Path)'
+            )
+        else:
+            command = 'echo "VERSION=${BASH_VERSION:-${ZSH_VERSION:-unknown}}"; echo "CWD=$(pwd)"'
+
+        result = await self._run_probe(command)
+        if result is None:
+            return None, ""
+
+        version: str | None = None
+        cwd = ""
+        for raw in result.stdout.splitlines():
+            line = raw.strip()
+            if line.startswith("VERSION="):
+                value = line[len("VERSION=") :].strip()
+                version = None if not value or value == "unknown" else value
+            elif line.startswith("CWD="):
+                cwd = line[len("CWD=") :].strip()
+        return version, cwd
+
+    async def _probe_tool_version(self, tool: str) -> str | None:
+        # Reject anything that is not a plain identifier — the tool name
+        # is interpolated into a shell command, so quotes, $, ;, |, &,
+        # whitespace, etc. would allow command injection if the tool list
+        # were sourced from untrusted input.
+        if not tool or not _TOOL_NAME_PATTERN.match(tool):
+            return None
+
+        result = await self._run_probe(f"{tool} --version")
+        if result is None or result.exit_code != 0:
+            return None
+
+        # Some CLIs (older java, gcc) emit --version on stderr.
+        line = _first_non_empty_line(result.stdout) or _first_non_empty_line(result.stderr)
+        return line if line else None
+
+    async def _run_probe(self, command: str) -> ShellResult | None:
+        try:
+            return await self._executor.run(command, timeout=self._options.probe_timeout)
+        except asyncio.TimeoutError:
+            return None
+        except (ShellCommandError, ShellExecutionError, ShellTimeoutError):
+            return None
+
+
+def default_instructions_formatter(snapshot: ShellEnvironmentSnapshot) -> str:
+    """Render ``snapshot`` as the default instructions block.
+
+    Public so callers that want to wrap or extend the default can call
+    it from a custom ``instructions_formatter``.
+    """
+    lines: list[str] = ["## Shell environment"]
+    version_suffix = f" {snapshot.shell_version}" if snapshot.shell_version else ""
+
+    if snapshot.family is ShellFamily.POWERSHELL:
+        lines.append(f"You are operating a PowerShell{version_suffix} session on {snapshot.os_description}.")
+        lines.append("Use PowerShell idioms, NOT bash:")
+        lines.append("- Set environment variables with `$env:NAME = 'value'` (NOT `NAME=value`).")
+        lines.append("- Change directory with `Set-Location` or `cd`. Paths use `\\` separators.")
+        lines.append("- Reference environment variables as `$env:NAME` (NOT `$NAME`).")
+        lines.append("- The system temp directory is `[System.IO.Path]::GetTempPath()` (NOT `/tmp`).")
+        lines.append("- Pipe to `Out-Null` to suppress output (NOT `> /dev/null`).")
+    else:
+        lines.append(f"You are operating a POSIX shell{version_suffix} session on {snapshot.os_description}.")
+        lines.append("Use POSIX shell idioms (bash/sh).")
+        lines.append("- Set environment variables for the next command with `export NAME=value`.")
+        lines.append("- Reference environment variables as `$NAME` or `${NAME}`.")
+        lines.append("- Paths use `/` separators.")
+
+    if snapshot.working_directory:
+        lines.append(f"Working directory: {snapshot.working_directory}")
+
+    installed = [f"{name} ({version})" for name, version in snapshot.tool_versions.items() if version]
+    missing = [name for name, version in snapshot.tool_versions.items() if version is None]
+    if installed:
+        lines.append("Available CLIs: " + ", ".join(installed))
+    if missing:
+        lines.append("Not installed: " + ", ".join(missing))
+
+    return "\n".join(lines)
@@ -0,0 +1,93 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Stateless shell executor.
+
+Each call to :func:`run_stateless` spawns a fresh subprocess, captures
+stdout/stderr concurrently, enforces a timeout by killing the whole process
+tree, and truncates oversized output. Matches the behaviour of AutoGen's
+``LocalCommandLineCodeExecutor`` and OpenAI Agents SDK's ``local_shell``
+protocol.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import subprocess  # noqa: S404  # nosec B404 - executing user shell commands is the whole point
+import sys
+import time
+from collections.abc import Mapping, Sequence
+
+from ._killtree import kill_process_tree
+from ._resolve import is_powershell
+from ._truncate import truncate_head_tail as _truncate
+from ._types import ShellResult
+
+
+def _popen_kwargs_for_group() -> dict[str, object]:
+    """Platform-specific process-group isolation so we can kill children too."""
+    if sys.platform == "win32":
+        # CREATE_NEW_PROCESS_GROUP lets CTRL_BREAK_EVENT hit the whole group.
+        return {"creationflags": subprocess.CREATE_NEW_PROCESS_GROUP}  # type: ignore[attr-defined]
+    return {"start_new_session": True}
+
+
+async def run_stateless(
+    argv: Sequence[str],
+    command: str,
+    *,
+    workdir: str | None,
+    env: Mapping[str, str] | None,
+    timeout: float | None,
+    max_output_bytes: int,
+) -> ShellResult:
+    """Execute ``command`` via ``argv`` + ``-c``/``-Command`` + command.
+
+    Args:
+        argv: Base shell invocation (from :func:`resolve_shell` with
+            ``interactive=False``).
+        command: User command string.
+        workdir: Working directory, or ``None`` to inherit.
+        env: Environment variables, or ``None`` to inherit the current
+            process environment.
+        timeout: Seconds before the process tree is killed; ``None`` disables.
+        max_output_bytes: Combined byte cap per stream before truncation.
+    """
+    # For PowerShell we prepend a UTF-8 encoding preamble so powershell.exe
+    # on Windows (cp1252 by default) doesn't mojibake non-ASCII output.
+    if is_powershell(argv):
+        command = "$OutputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new($false); " + command
+    full_argv = [*argv, command]
+    started = time.monotonic()
+    proc = await asyncio.create_subprocess_exec(
+        *full_argv,
+        stdout=asyncio.subprocess.PIPE,
+        stderr=asyncio.subprocess.PIPE,
+        cwd=workdir,
+        env=dict(env) if env is not None else None,
+        **_popen_kwargs_for_group(),  # type: ignore[arg-type]
+    )
+
+    timed_out = False
+    try:
+        stdout_bytes, stderr_bytes = await asyncio.wait_for(proc.communicate(), timeout=timeout)
+    except asyncio.TimeoutError:
+        timed_out = True
+        await kill_process_tree(proc)
+        # Drain any queued output.
+        try:
+            stdout_bytes, stderr_bytes = await proc.communicate()
+        except Exception:
+            stdout_bytes, stderr_bytes = b"", b""
+
+    duration_ms = int((time.monotonic() - started) * 1000)
+    stdout_str, stdout_truncated = _truncate(stdout_bytes or b"", max_output_bytes)
+    stderr_str, stderr_truncated = _truncate(stderr_bytes or b"", max_output_bytes)
+
+    return ShellResult(
+        stdout=stdout_str,
+        stderr=stderr_str,
+        exit_code=proc.returncode if proc.returncode is not None else -1,
+        duration_ms=duration_ms,
+        truncated=stdout_truncated or stderr_truncated,
+        timed_out=timed_out,
+    )
@@ -0,0 +1,65 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Shell executor protocol.
+
+A :class:`ShellExecutor` is the swappable backend for shell-tool execution.
+``LocalShellTool`` runs commands directly on the host with no process-level
+isolation; the approval-in-the-loop gate is the intended boundary.
+``DockerShellTool`` runs commands inside a container — when the container
+runtime is trusted and the default isolation flags are kept, the container
+is the intended boundary instead of approval.
+
+The protocol is intentionally minimal so callers can plug in their own
+executor (e.g. a Firecracker microVM, a remote SSH host, a WASI runtime
+that ships a busybox-WASM build) without forking the framework.
+
+**Single-session ownership.** An executor instance — and the shell tool
+that wraps it — is intended to serve a single conversation / agent session,
+i.e. a single user. In persistent mode the executor owns a long-lived
+shell process (and, for ``DockerShellTool``, a long-lived container) whose
+state — working directory, exported variables, command history, in-flight
+background jobs, files written to the container — is visible to every
+subsequent command. A single stdin/stdout pipe serializes every call,
+and the framework does not isolate one caller's state from another's.
+Build one executor / one shell tool per session, treat it as owned by
+that session for its lifetime, and close it when the session ends. Do
+not share a persistent-mode instance across users, tenants, or concurrent
+conversations. If a shared instance is genuinely required, construct the
+shell tool with ``mode="stateless"`` so every call spawns a fresh process
+or container.
+"""
+
+from __future__ import annotations
+
+from typing import Protocol, runtime_checkable
+
+from ._types import ShellResult
+
+
+@runtime_checkable
+class ShellExecutor(Protocol):
+    """Async-context-manageable backend that runs shell commands."""
+
+    async def start(self) -> None:
+        """Eagerly initialise the backend (no-op if already started)."""
+
+    async def close(self) -> None:
+        """Tear down all backend resources. Idempotent."""
+
+    async def run(self, command: str, *, timeout: float | None = None) -> ShellResult:
+        """Execute ``command`` and return its result.
+
+        Args:
+            command: The shell command to execute.
+            timeout: Optional per-call timeout in seconds. When ``None``,
+                the executor uses its configured default. Implementations
+                **must** enforce this timeout cancellation-safely (e.g.
+                kill the subprocess or tear down the session on timeout)
+                so callers can rely on the timeout to bound execution
+                without leaking processes on cancellation.
+        """
+        ...
+
+    async def __aenter__(self) -> ShellExecutor: ...
+
+    async def __aexit__(self, *exc: object) -> None: ...
@@ -0,0 +1,132 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Cross-OS process-tree termination.
+
+Delegates to :mod:`psutil` for process introspection when available, with
+a stdlib fallback. Tree-kill matters because a timed-out shell command can
+spawn child processes (``make``, network tools, watchers, …); leaving
+them running would defeat the timeout.
+
+Notes:
+* On Windows, ``taskkill.exe`` is resolved to its absolute system path so
+  a modified ``PATH`` cannot redirect the call to a different binary.
+* psutil's ``Process.children(recursive=True)`` walks parent-child
+  relationships via OS APIs (``CreateToolhelp32Snapshot`` on Windows,
+  ``/proc`` on Linux, ``proc_listpids`` on macOS), which is why it is
+  preferred over a hand-rolled platform-conditional implementation.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import contextlib
+import os
+import signal
+import sys
+
+try:  # pragma: no cover - importable on every platform we ship
+    import psutil  # type: ignore[import-untyped]
+
+    _has_psutil = True
+except ImportError:  # pragma: no cover
+    _has_psutil = False
+    psutil = None  # type: ignore[assignment]
+
+
+_taskkill_path: str | None = None
+
+
+def _resolve_taskkill() -> str:
+    """Absolute path to taskkill.exe to defeat PATH poisoning."""
+    global _taskkill_path
+    if _taskkill_path is not None:
+        return _taskkill_path
+    system_root = os.environ.get("SystemRoot") or os.environ.get("SYSTEMROOT") or r"C:\Windows"  # noqa: SIM112
+    candidate = os.path.join(system_root, "System32", "taskkill.exe")
+    _taskkill_path = candidate if os.path.isfile(candidate) else "taskkill"
+    return _taskkill_path
+
+
+async def kill_process_tree(
+    proc: asyncio.subprocess.Process,
+    *,
+    grace: float = 2.0,
+) -> None:
+    """Terminate ``proc`` and all of its descendants. Best-effort, never raises."""
+    if proc.returncode is not None:
+        return
+    if _has_psutil:
+        await _kill_via_psutil(proc, grace=grace)
+        return
+    await _kill_via_stdlib(proc, grace=grace)
+
+
+async def _kill_via_psutil(
+    proc: asyncio.subprocess.Process,
+    *,
+    grace: float,
+) -> None:
+    if psutil is None:
+        raise RuntimeError("_kill_via_psutil called without psutil available")
+    try:
+        parent = psutil.Process(proc.pid)
+    except psutil.NoSuchProcess:
+        return
+    try:
+        descendants = parent.children(recursive=True)
+    except psutil.NoSuchProcess:
+        descendants = []
+    victims = [parent, *descendants]
+
+    # Phase 1: SIGTERM (or terminate() on Windows, which also asks nicely).
+    for v in victims:
+        with contextlib.suppress(psutil.NoSuchProcess, psutil.AccessDenied):
+            v.terminate()
+
+    # Wait briefly for graceful exit.
+    with contextlib.suppress(asyncio.TimeoutError):
+        await asyncio.wait_for(proc.wait(), timeout=grace)
+
+    # Phase 2: SIGKILL anything still alive.
+    for v in victims:
+        with contextlib.suppress(psutil.NoSuchProcess, psutil.AccessDenied):
+            if v.is_running():
+                v.kill()
+    with contextlib.suppress(asyncio.TimeoutError):
+        await asyncio.wait_for(proc.wait(), timeout=grace)
+
+
+async def _kill_via_stdlib(
+    proc: asyncio.subprocess.Process,
+    *,
+    grace: float,
+) -> None:
+    """Fallback when psutil isn't installed. Less robust on Windows."""
+    if sys.platform == "win32":
+        try:
+            killer = await asyncio.create_subprocess_exec(
+                _resolve_taskkill(),
+                "/T",
+                "/F",
+                "/PID",
+                str(proc.pid),
+                stdout=asyncio.subprocess.DEVNULL,
+                stderr=asyncio.subprocess.DEVNULL,
+            )
+            with contextlib.suppress(asyncio.TimeoutError):
+                await asyncio.wait_for(killer.wait(), timeout=grace)
+            if killer.returncode is None:
+                killer.kill()
+        except (FileNotFoundError, OSError):
+            pass
+        with contextlib.suppress(ProcessLookupError, OSError):
+            proc.kill()
+        return
+    try:
+        os.killpg(os.getpgid(proc.pid), signal.SIGTERM)
+        with contextlib.suppress(asyncio.TimeoutError):
+            await asyncio.wait_for(proc.wait(), timeout=grace)
+            return
+        os.killpg(os.getpgid(proc.pid), signal.SIGKILL)
+    except (ProcessLookupError, PermissionError):
+        pass
@@ -0,0 +1,125 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+r"""Policy model for :class:`LocalShellTool` and :class:`DockerShellTool`.
+
+``ShellPolicy`` is evaluated *before* approval and *before* execution. It
+lets callers define allow/deny rules and an optional final custom callback.
+
+.. warning::
+   **Not a security boundary; not even a security feature.** ``ShellPolicy``
+   is a UX pre-filter: it gives operators a way to surface a friendly error
+   for site-specific patterns (e.g. "we don't run ``ssh`` from this agent",
+   "block our prod hostname") before approval and before execution. It is
+   **not** a defense against a malicious model or prompt-injected input.
+   Regex matching on the command spelling cannot see what the shell will
+   actually execute after expansion. Trivial bypasses include backslash
+   insertion (``r''m -rf /``), variable expansion (``${RM:=rm} -rf /``),
+   interpreter escape hatches (``python -c "import os; os.system('rm -rf /')"``),
+   base64 / hex / printf smuggling (``eval $(printf '\\x72\\x6d -rf /')``),
+   command substitution (``$(base64 -d <<<...)``), envvar splicing
+   (``$(A=r B=m; echo $A$B) -rf /``), and absolute paths
+   (``/usr/bin/rm`` matches ``\\brm\\b`` only when the pattern is loose).
+
+   **No default patterns.** ``ShellPolicy()`` constructs an empty deny-list.
+   The framework deliberately ships no built-in patterns so it does not
+   give a false impression of safety. Survey of competing agent frameworks
+   (LangChain, AutoGen, OpenAI Agents SDK, Claude Code, Goose, Continue.dev,
+   OpenHands, Open Interpreter, Aider, smolagents, LangGraph) found that
+   none use regex matching as a primary security control; AutoGen v2
+   explicitly removed their built-in deny-list.
+
+   The actual security boundary is **(a) approval-in-the-loop** (default
+   ``approval_mode="always_require"``) and **(b) operator trust / sandbox
+   tier**. For untrusted input use ``DockerShellTool`` or
+   ``HyperlightCodeActProvider`` (microVM); pair either with approval gating.
+"""
+
+from __future__ import annotations
+
+import re
+from collections.abc import Callable, Sequence
+from dataclasses import dataclass, field
+from typing import Literal, Union
+
+PatternLike = Union[str, re.Pattern[str]]
+
+
+@dataclass(frozen=True)
+class ShellRequest:
+    """A single command awaiting a policy decision."""
+
+    command: str
+    workdir: str | None = None
+
+
+@dataclass(frozen=True)
+class ShellDecision:
+    """Result of a policy evaluation."""
+
+    decision: Literal["allow", "deny"]
+    reason: str = ""
+
+
+def _compile_patterns(patterns: Sequence[PatternLike]) -> tuple[re.Pattern[str], ...]:
+    compiled: list[re.Pattern[str]] = []
+    for pat in patterns:
+        compiled.append(pat if isinstance(pat, re.Pattern) else re.compile(pat, re.IGNORECASE))
+    return tuple(compiled)
+
+
+@dataclass
+class ShellPolicy:
+    """Layered allow/deny policy for shell commands.
+
+    Evaluation order (first hit wins):
+
+    1. ``denylist`` — if any pattern matches, the command is **denied**.
+    2. ``allowlist`` — if set and no pattern matches, the command is
+       **denied**. When ``allowlist`` is ``None`` the allow rule is skipped.
+    3. ``custom`` — user-supplied callback gets the final say and may return
+       a :class:`ShellDecision` to override allow/deny outcomes.
+    4. Otherwise the command is **allowed**.
+
+    All regex patterns are compiled case-insensitively.
+
+    Defaults are empty: ``ShellPolicy()`` allows every non-empty command.
+    Supply ``denylist`` and/or ``allowlist`` explicitly to enable filtering.
+    See the module docstring for why the framework does not ship default
+    deny patterns.
+    """
+
+    denylist: Sequence[PatternLike] = field(default_factory=tuple)
+    allowlist: Sequence[PatternLike] | None = None
+    custom: Callable[[ShellRequest], ShellDecision | None] | None = None
+
+    _denies: tuple[re.Pattern[str], ...] = field(init=False, repr=False, compare=False)
+    _allows: tuple[re.Pattern[str], ...] | None = field(init=False, repr=False, compare=False)
+
+    def __post_init__(self) -> None:
+        self._denies = _compile_patterns(self.denylist)
+        self._allows = _compile_patterns(self.allowlist) if self.allowlist is not None else None
+
+    def evaluate(self, request: ShellRequest) -> ShellDecision:
+        """Return an allow/deny decision for ``request``.
+
+        Empty/whitespace-only commands are denied (there is nothing to
+        run). With default settings (no denylist, no allowlist) every
+        non-empty command is allowed.
+        """
+        command = request.command.strip()
+        if not command:
+            return ShellDecision("deny", "command is empty")
+        for pat in self._denies:
+            if pat.search(command):
+                return ShellDecision("deny", f"matches denylist pattern: {pat.pattern}")
+        if self._allows is not None and not any(pat.search(command) for pat in self._allows):
+            return ShellDecision("deny", "command does not match allowlist")
+        if self.custom is not None:
+            override = self.custom(request)
+            if override is not None:
+                return override
+        return ShellDecision("allow")
+
+    def evaluate_command(self, command: str) -> ShellDecision:
+        """Convenience: evaluate a bare command with no workdir context."""
+        return self.evaluate(ShellRequest(command=command))
@@ -0,0 +1,111 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Cross-platform shell discovery for :class:`LocalShellTool`."""
+
+from __future__ import annotations
+
+import os
+import shutil
+import sys
+from collections.abc import Sequence
+
+from ._types import ShellExecutionError
+
+_ENV_OVERRIDE = "AGENT_FRAMEWORK_SHELL"
+
+
+def resolve_shell(shell: str | Sequence[str] | None, *, interactive: bool) -> list[str]:
+    """Resolve the shell invocation argv.
+
+    Priority:
+
+    1. Explicit ``shell`` argument (string is split via shlex rules; sequence
+       is used verbatim).
+    2. ``AGENT_FRAMEWORK_SHELL`` environment variable.
+    3. Platform default.
+
+    When ``interactive=False`` (stateless mode), the returned argv is
+    guaranteed to end with a ``-c`` (POSIX) or ``-Command`` (PowerShell)
+    flag so the caller can append a command string verbatim. Overrides
+    that already include the flag are left as-is.
+
+    Args:
+        shell: Optional override supplied by the caller.
+        interactive: When ``True`` (persistent mode), the returned argv is
+            suitable for a long-lived session that reads commands from
+            ``stdin``. When ``False`` (stateless mode), the caller will
+            append a command string to this argv (so the argv must end
+            with ``-c`` / ``-Command``).
+    """
+    if shell is not None:
+        if isinstance(shell, str):
+            import shlex
+
+            parts = shlex.split(shell)
+            if not parts:
+                raise ShellExecutionError("shell override must not be empty")
+        else:
+            parts = list(shell)
+            if not parts:
+                raise ShellExecutionError("shell override must not be empty")
+        return parts if interactive else _ensure_command_flag(parts)
+
+    env_override = os.environ.get(_ENV_OVERRIDE)
+    if env_override:
+        import shlex
+
+        parts = shlex.split(env_override)
+        if parts:
+            return parts if interactive else _ensure_command_flag(parts)
+
+    if sys.platform == "win32":
+        binary = shutil.which("pwsh") or shutil.which("powershell")
+        if binary is None:
+            raise ShellExecutionError(
+                f"Neither 'pwsh' nor 'powershell' was found on PATH. Install PowerShell 7+ or set {_ENV_OVERRIDE}."
+            )
+        if interactive:
+            # Interactive persistent session reads from stdin via '-'.
+            return [binary, "-NoLogo", "-NoProfile", "-NonInteractive", "-Command", "-"]
+        return [binary, "-NoLogo", "-NoProfile", "-NonInteractive", "-Command"]
+
+    for candidate in ("/bin/bash", "/usr/bin/bash", "/bin/sh", "/usr/bin/sh"):
+        if os.path.exists(candidate):
+            if interactive:
+                return [candidate, "--noprofile", "--norc"] if candidate.endswith("bash") else [candidate]
+            return [candidate, "-c"]
+    # Last-ditch fallback: let PATH resolve 'sh'.
+    sh = shutil.which("sh")
+    if sh is None:
+        raise ShellExecutionError(f"No POSIX shell found on PATH. Set {_ENV_OVERRIDE} to override.")
+    return [sh] if interactive else [sh, "-c"]
+
+
+def is_powershell(argv: Sequence[str]) -> bool:
+    """Return True when ``argv[0]`` appears to be PowerShell."""
+    if not argv:
+        return False
+    name = os.path.basename(argv[0]).lower()
+    return name in {"pwsh", "pwsh.exe", "powershell", "powershell.exe"}
+
+
+def _ensure_command_flag(argv: list[str]) -> list[str]:
+    """Append the right ``-c`` / ``-Command`` flag for stateless argv.
+
+    The caller (``run_stateless``) appends the user's command string
+    verbatim to this argv. If a user-supplied override omits the
+    ``-c`` / ``-Command`` flag, the command would be misinterpreted
+    (POSIX shells treat the next positional arg as a script file
+    path). This helper normalises overrides so they execute correctly
+    in stateless mode.
+    """
+    if not argv:
+        return argv
+    last = argv[-1].lower()
+    if is_powershell(argv):
+        if last in {"-command", "-c"}:
+            return argv
+        return [*argv, "-Command"]
+    if last == "-c":
+        return argv
+    return [*argv, "-c"]
@@ -0,0 +1,443 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Persistent shell session.
+
+A :class:`ShellSession` launches a long-lived shell subprocess and executes
+commands one at a time by writing them to stdin followed by a **sentinel
+probe** that reports the exit status. Reading stdout until the sentinel
+appears gives a reliable command boundary without relying on job control
+or PTYs, which keeps the same code path working on bash, sh, and pwsh.
+
+**Single-owner contract.** A :class:`ShellSession` is owned by exactly one
+conversation / agent session — i.e. one user. The backing shell process
+carries mutable state (cwd, exported variables, history, background jobs)
+that every subsequent command can observe, and the internal
+``asyncio.Lock`` serializes every call onto the single stdin/stdout pipe.
+There is no per-caller isolation. The enclosing shell tool must not share
+a single session across users, tenants, or concurrent conversations; it
+must create one session per agent session and close it when the session
+ends.
+
+Notes:
+* ``pwsh -NoProfile -NoLogo -NonInteractive -Command -`` waits for a
+  complete parse before executing, so multi-line ``try`` blocks stall
+  with stdin open. To avoid that, the user command is base64-encoded
+  and invoked with ``Invoke-Expression`` on a single line.
+* ``Write-Output`` routes through the PowerShell pipeline formatter,
+  which may drop trailing newlines when stdout is redirected. The
+  sentinel is emitted via ``[Console]::WriteLine`` followed by an
+  explicit ``[Console]::Out.Flush()``.
+* ``$LASTEXITCODE`` only tracks external-process exits, so the rc is
+  also derived from ``$?`` and caught exceptions.
+* stdout and stderr are consumed by **persistent reader tasks** that
+  run for the lifetime of the session. Each ``run()`` snapshots buffer
+  offsets before writing the command and scans forward from there.
+  This avoids ``read() called while another coroutine is already
+  waiting`` errors from per-call ``wait_for(stream.read())`` loops and
+  prevents late stderr from being attributed to the next command.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import base64
+import os
+import secrets
+import signal
+import sys
+import time
+from collections.abc import Mapping, Sequence
+
+from ._killtree import kill_process_tree
+from ._resolve import is_powershell
+from ._truncate import truncate_head_tail as _truncate_bytes
+from ._truncate import truncate_text_head_tail as _truncate_text
+from ._types import ShellResult
+
+_READ_CHUNK = 64 * 1024
+_SHUTDOWN_GRACE = 2.0
+# Extra grace window after the sentinel arrives to let late stderr drain.
+_STDERR_QUIESCENCE = 0.05
+
+
+class ShellSession:
+    """A long-lived shell subprocess that executes commands via sentinels.
+
+    The session is thread-unsafe by design but async-safe: concurrent calls
+    to :meth:`run` are serialised with an internal :class:`asyncio.Lock`.
+    """
+
+    def __init__(
+        self,
+        argv: Sequence[str],
+        *,
+        workdir: str | None = None,
+        env: Mapping[str, str] | None = None,
+        max_output_bytes: int = 64 * 1024,
+    ) -> None:
+        self._argv = list(argv)
+        self._workdir = workdir
+        self._env = dict(env) if env is not None else None
+        self._max_output_bytes = max_output_bytes
+        self._proc: asyncio.subprocess.Process | None = None
+        # Serialises per-command execution onto the single stdin/stdout
+        # pipe. This is an ordering primitive within one owning session;
+        # it is NOT a multi-tenant isolation mechanism. ShellSession is
+        # single-owner — see the module docstring. The lock just
+        # guarantees concurrent calls from the one owner queue cleanly
+        # instead of interleaving on the pipe.
+        self._run_lock = asyncio.Lock()
+        # Serialises start/close so concurrent first-callers don't spawn
+        # multiple subprocesses.
+        self._lifecycle_lock = asyncio.Lock()
+        self._sentinel_tag = secrets.token_hex(8)
+        self._is_pwsh = is_powershell(argv)
+
+        # Persistent reader state. The reader tasks append into these
+        # buffers; _run_locked scans forward from a per-call offset.
+        self._stdout_buf = bytearray()
+        self._stderr_buf = bytearray()
+        self._stdout_event = asyncio.Event()
+        self._stdout_reader: asyncio.Task[None] | None = None
+        self._stderr_reader: asyncio.Task[None] | None = None
+        self._stdout_closed = False
+
+    # ------------------------------------------------------------------ lifecycle
+
+    async def start(self) -> None:
+        """Spawn the shell if it isn't already running."""
+        async with self._lifecycle_lock:
+            if self._proc is not None and self._proc.returncode is None:
+                return
+            popen_kwargs: dict[str, object] = {}
+            if sys.platform == "win32":
+                import subprocess  # noqa: S404  # nosec B404 - Win32 constants only
+
+                popen_kwargs["creationflags"] = subprocess.CREATE_NEW_PROCESS_GROUP  # type: ignore[attr-defined]
+            else:
+                popen_kwargs["start_new_session"] = True
+
+            self._proc = await asyncio.create_subprocess_exec(
+                *self._argv,
+                stdin=asyncio.subprocess.PIPE,
+                stdout=asyncio.subprocess.PIPE,
+                stderr=asyncio.subprocess.PIPE,
+                cwd=self._workdir,
+                env=self._env,
+                **popen_kwargs,  # type: ignore[arg-type]
+            )
+
+            # Reset buffer state in case this is a restart after close().
+            self._stdout_buf.clear()
+            self._stderr_buf.clear()
+            self._stdout_event = asyncio.Event()
+            self._stdout_closed = False
+
+            if self._proc.stdout is None or self._proc.stderr is None:
+                raise RuntimeError("subprocess pipes were not created; stdout/stderr unavailable")
+            self._stdout_reader = asyncio.create_task(self._reader(self._proc.stdout, self._stdout_buf, is_stdout=True))
+            self._stderr_reader = asyncio.create_task(
+                self._reader(self._proc.stderr, self._stderr_buf, is_stdout=False)
+            )
+
+            # Best-effort: make PowerShell emit UTF-8 and fail loudly on errors.
+            if self._is_pwsh:
+                await self._write_raw(
+                    "$OutputEncoding = [Console]::OutputEncoding = "
+                    "[System.Text.UTF8Encoding]::new($false);"
+                    "$ErrorActionPreference = 'Stop'\n"
+                )
+
+    async def close(self) -> None:
+        """Terminate the shell cleanly, falling back to SIGKILL."""
+        async with self._lifecycle_lock:
+            proc = self._proc
+            self._proc = None
+            if proc is None or proc.returncode is not None:
+                await self._cancel_readers()
+                return
+            try:
+                if proc.stdin is not None and not proc.stdin.is_closing():
+                    try:
+                        proc.stdin.write(b"exit\n")
+                        await proc.stdin.drain()
+                        proc.stdin.close()
+                    except (ConnectionResetError, BrokenPipeError):
+                        pass
+                try:
+                    await asyncio.wait_for(proc.wait(), timeout=_SHUTDOWN_GRACE)
+                except asyncio.TimeoutError:
+                    await kill_process_tree(proc, grace=_SHUTDOWN_GRACE)
+            except Exception:  # nosec B110 - best-effort shutdown; falls through to forced kill in finally
+                pass
+            finally:
+                await self._cancel_readers()
+
+    async def _cancel_readers(self) -> None:
+        for t in (self._stdout_reader, self._stderr_reader):
+            if t is not None and not t.done():
+                t.cancel()
+                try:
+                    await t
+                except (asyncio.CancelledError, Exception):
+                    pass
+        self._stdout_reader = None
+        self._stderr_reader = None
+
+    async def __aenter__(self) -> ShellSession:
+        await self.start()
+        return self
+
+    async def __aexit__(self, *_exc: object) -> None:
+        await self.close()
+
+    # ------------------------------------------------------------------ execution
+
+    async def run(
+        self,
+        command: str,
+        *,
+        timeout: float | None,
+    ) -> ShellResult:
+        """Run ``command`` in the live session and return its result."""
+        await self.start()
+        async with self._run_lock:
+            return await self._run_locked(command, timeout=timeout)
+
+    async def _run_locked(self, command: str, *, timeout: float | None) -> ShellResult:
+        if self._proc is None or self._proc.stdin is None:
+            raise RuntimeError("ShellSession is not running; call start() first")
+
+        sentinel = f"__AF_END_{self._sentinel_tag}_{secrets.token_hex(4)}__"
+        script = self._build_script(command, sentinel)
+        # Snapshot current buffer positions so we only attribute output
+        # produced *after* the command is submitted.
+        stdout_offset = len(self._stdout_buf)
+        stderr_offset = len(self._stderr_buf)
+        self._stdout_event.clear()
+
+        started = time.monotonic()
+        try:
+            self._proc.stdin.write(script.encode("utf-8"))
+            await self._proc.stdin.drain()
+        except (ConnectionResetError, BrokenPipeError) as exc:
+            raise RuntimeError("persistent shell session is no longer alive") from exc
+
+        needle = sentinel.encode("utf-8")
+        timed_out = False
+        hard_cap = self._max_output_bytes * 4
+
+        async def _wait_for_sentinel() -> tuple[int, int]:
+            """Return (sentinel_start_index, exit_code) once seen."""
+            while True:
+                idx = self._stdout_buf.find(needle, stdout_offset)
+                if idx != -1:
+                    # Parse trailing ``_<digits>``.
+                    tail_start = idx + len(needle)
+                    # Wait briefly for the rc digits + newline to arrive.
+                    deadline = time.monotonic() + 1.0
+                    while time.monotonic() < deadline:
+                        after = bytes(self._stdout_buf[tail_start:])
+                        nl = after.find(b"\n")
+                        if nl != -1:
+                            break
+                        if self._stdout_closed:
+                            break
+                        self._stdout_event.clear()
+                        try:
+                            await asyncio.wait_for(self._stdout_event.wait(), timeout=0.1)
+                        except asyncio.TimeoutError:
+                            pass
+                    after = bytes(self._stdout_buf[tail_start:])
+                    rc = _parse_rc(after)
+                    return idx, rc
+                if self._stdout_closed:
+                    raise RuntimeError("shell closed stdout before emitting sentinel")
+                if len(self._stdout_buf) - stdout_offset > hard_cap:
+                    raise _SentinelOverflow
+                self._stdout_event.clear()
+                try:
+                    await asyncio.wait_for(self._stdout_event.wait(), timeout=0.5)
+                except asyncio.TimeoutError:
+                    # Keep spinning; timeout is enforced at the wait_for below.
+                    pass
+
+        sentinel_idx: int
+        exit_code: int
+        try:
+            sentinel_idx, exit_code = await asyncio.wait_for(_wait_for_sentinel(), timeout=timeout)
+        except asyncio.TimeoutError:
+            timed_out = True
+            await self._interrupt_current_command()
+            try:
+                sentinel_idx, exit_code = await asyncio.wait_for(_wait_for_sentinel(), timeout=_SHUTDOWN_GRACE)
+            except (asyncio.TimeoutError, RuntimeError, _SentinelOverflow):
+                # Session is unrecoverable; tear it down so the next call
+                # gets a fresh subprocess.
+                await self.close()
+                duration_ms = int((time.monotonic() - started) * 1000)
+                stdout_bytes = bytes(self._stdout_buf[stdout_offset:])
+                stderr_bytes = bytes(self._stderr_buf[stderr_offset:])
+                stdout_str, so_trunc = _truncate_bytes(stdout_bytes, self._max_output_bytes)
+                stderr_str, se_trunc = _truncate_bytes(stderr_bytes, self._max_output_bytes)
+                return ShellResult(
+                    stdout=stdout_str,
+                    stderr=stderr_str,
+                    exit_code=124,
+                    duration_ms=duration_ms,
+                    truncated=so_trunc or se_trunc,
+                    timed_out=True,
+                )
+        except _SentinelOverflow:
+            # Runaway output; recover by interrupting and restarting.
+            await self._interrupt_current_command()
+            await self.close()
+            duration_ms = int((time.monotonic() - started) * 1000)
+            stdout_bytes = bytes(self._stdout_buf[stdout_offset : stdout_offset + hard_cap])
+            stderr_bytes = bytes(self._stderr_buf[stderr_offset:])
+            stdout_str, _ = _truncate_bytes(stdout_bytes, self._max_output_bytes)
+            stderr_str, _ = _truncate_bytes(stderr_bytes, self._max_output_bytes)
+            return ShellResult(
+                stdout=stdout_str,
+                stderr=stderr_str,
+                exit_code=-1,
+                duration_ms=duration_ms,
+                truncated=True,
+                timed_out=False,
+            )
+
+        # Let stderr quiesce briefly — late writes from the completing
+        # command otherwise leak into the *next* run().
+        await asyncio.sleep(_STDERR_QUIESCENCE)
+
+        duration_ms = int((time.monotonic() - started) * 1000)
+        stdout_raw = bytes(self._stdout_buf[stdout_offset:sentinel_idx])
+        stderr_raw = bytes(self._stderr_buf[stderr_offset:])
+
+        stdout_text = stdout_raw.decode("utf-8", errors="replace").rstrip("\r\n")
+        stderr_text = stderr_raw.decode("utf-8", errors="replace")
+
+        stdout_str, stdout_truncated = _truncate_text(stdout_text, self._max_output_bytes)
+        stderr_str, stderr_truncated = _truncate_text(stderr_text, self._max_output_bytes)
+
+        # Trim the persistent buffers: everything we needed has been copied
+        # into stdout_raw/stderr_raw above, so discarding now keeps the
+        # session's memory bounded across many commands. The reader tasks
+        # only ever ``extend()`` these buffers (no offset bookkeeping
+        # outside this method), so resetting them here is safe.
+        del self._stdout_buf[:]
+        del self._stderr_buf[:]
+
+        return ShellResult(
+            stdout=stdout_str,
+            stderr=stderr_str,
+            exit_code=exit_code,
+            duration_ms=duration_ms,
+            truncated=stdout_truncated or stderr_truncated,
+            timed_out=timed_out,
+        )
+
+    # ------------------------------------------------------------------ helpers
+
+    def _build_script(self, command: str, sentinel: str) -> str:
+        if self._is_pwsh:
+            # Base64-encode the command and run it via Invoke-Expression to
+            # work around pwsh's whole-script parse requirement on stdin.
+            # $ErrorActionPreference is set to 'Stop' at session start so
+            # the catch block fires on cmdlet errors as well as parse
+            # failures surfaced by Invoke-Expression itself.
+            encoded = base64.b64encode(command.encode("utf-8")).decode("ascii")
+            return (
+                "& {"
+                " $__af_rc = 0;"
+                " try {"
+                f"   $__af_cmd = [System.Text.Encoding]::UTF8.GetString([Convert]::FromBase64String('{encoded}'));"
+                "   Invoke-Expression $__af_cmd;"
+                "   if ($LASTEXITCODE -ne $null) { $__af_rc = $LASTEXITCODE }"
+                "   elseif (-not $?) { $__af_rc = 1 }"
+                " } catch {"
+                "   [Console]::Error.WriteLine($_.ToString());"
+                "   $__af_rc = 1"
+                " } finally {"
+                f"   [Console]::WriteLine('{sentinel}_' + $__af_rc);"
+                "   [Console]::Out.Flush()"
+                " }"
+                " }\n"
+            )
+        # POSIX shell. Run the user command in a brace-group so its exit
+        # status is captured even if the user previously enabled ``set -e``
+        # — we save and restore the prior errexit state around the trailer
+        # so ``set -e`` (and other shell options) persist across commands
+        # exactly as the user configured them.
+        return (
+            f"__af_e=$-; set +e; {{ {command}\n}}; __af_rc=$?;"
+            f' case "$__af_e" in *e*) set -e;; esac;'
+            f" printf '\\n{sentinel}_%s\\n' \"$__af_rc\"\n"
+        )
+
+    async def _write_raw(self, text: str) -> None:
+        if self._proc is None or self._proc.stdin is None:
+            return
+        self._proc.stdin.write(text.encode("utf-8"))
+        await self._proc.stdin.drain()
+
+    async def _reader(
+        self,
+        stream: asyncio.StreamReader,
+        buf: bytearray,
+        *,
+        is_stdout: bool,
+    ) -> None:
+        """Persistent reader task: drains ``stream`` into ``buf`` until EOF."""
+        try:
+            while True:
+                chunk = await stream.read(_READ_CHUNK)
+                if not chunk:
+                    if is_stdout:
+                        self._stdout_closed = True
+                        self._stdout_event.set()
+                    return
+                buf.extend(chunk)
+                if is_stdout:
+                    self._stdout_event.set()
+        except asyncio.CancelledError:
+            raise
+        except Exception:
+            if is_stdout:
+                self._stdout_closed = True
+                self._stdout_event.set()
+
+    async def _interrupt_current_command(self) -> None:
+        if self._proc is None or self._proc.returncode is not None:
+            return
+        try:
+            if sys.platform == "win32":
+                self._proc.send_signal(signal.CTRL_BREAK_EVENT)  # type: ignore[attr-defined]
+            else:
+                os.killpg(os.getpgid(self._proc.pid), signal.SIGINT)
+        except (ProcessLookupError, PermissionError, OSError):
+            pass
+
+
+def _parse_rc(after: bytes) -> int:
+    """Parse ``_<digits>`` trailing the sentinel. Returns -1 on failure."""
+    if not after.startswith(b"_"):
+        return -1
+    digits = bytearray()
+    for b in after[1:]:
+        if b == ord("\n") or b == ord("\r"):
+            break
+        if 48 <= b <= 57 or b == ord("-"):
+            digits.append(b)
+        else:
+            break
+    if not digits:
+        return -1
+    try:
+        return int(digits.decode("ascii"))
+    except ValueError:
+        return -1
+
+
+class _SentinelOverflow(RuntimeError):
+    """Internal signal that the sentinel was never seen within the soft cap."""
@@ -0,0 +1,317 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""High-level :class:`LocalShellTool` facade."""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+from collections.abc import Callable, Mapping, Sequence
+from typing import Literal
+
+from agent_framework import FunctionTool, tool
+from agent_framework._tools import SHELL_TOOL_KIND_VALUE
+
+from ._executor import run_stateless
+from ._policy import ShellPolicy, ShellRequest
+from ._resolve import is_powershell, resolve_shell
+from ._session import ShellSession
+from ._types import ShellCommandError, ShellMode, ShellResult
+
+logger = logging.getLogger(__name__)
+
+
+def _quote_posix(value: str) -> str:
+    r"""Return ``value`` wrapped in POSIX single quotes.
+
+    Single-quoted strings have no interpolation in POSIX shells. Embedded
+    single quotes are handled by closing the quote, inserting an escaped
+    quote, and reopening: ``a'b`` becomes ``'a'\''b'``.
+    """
+    return "'" + value.replace("'", "'\\''") + "'"
+
+
+def _quote_powershell(value: str) -> str:
+    """Return ``value`` wrapped in PowerShell single quotes.
+
+    Single-quoted strings in PowerShell are literal — ``$`` and ``"`` carry
+    no special meaning. Embedded single quotes are doubled (``''``).
+    """
+    return "'" + value.replace("'", "''") + "'"
+
+
+_PERSISTENT_DESCRIPTION = (
+    "Execute a single shell command on the local machine and return its "
+    "stdout, stderr, and exit code. Commands run in a persistent session so "
+    "`cd` and environment variables from previous calls are preserved. "
+    "Approval is required by default."
+)
+
+_STATELESS_DESCRIPTION = (
+    "Execute a single shell command on the local machine and return its "
+    "stdout, stderr, and exit code. Each command runs in a fresh subprocess, "
+    "so `cd` and environment variables do not persist between calls. "
+    "Approval is required by default."
+)
+
+
+def _default_description(mode: ShellMode) -> str:
+    return _PERSISTENT_DESCRIPTION if mode == "persistent" else _STATELESS_DESCRIPTION
+
+
+class LocalShellTool:
+    """Cross-OS local shell tool that plugs into any agent-framework chat client.
+
+    Typical use::
+
+        shell = LocalShellTool()
+        agent = Agent(
+            client=client,
+            tools=[client.get_shell_tool(func=shell.as_function())],
+        )
+
+    Or as an async context manager (recommended in persistent mode so the
+    session is cleaned up on exit)::
+
+        async with LocalShellTool() as shell:
+            ...
+
+    **Single-session ownership.** A persistent-mode :class:`LocalShellTool`
+    is owned by a single conversation / agent session — i.e. one user.
+    The backing shell process carries mutable state (cwd, exported
+    variables, shell history, background jobs) that every subsequent
+    command can observe, and a single stdin/stdout pipe serializes every
+    call. Do not share one instance across users, tenants, or concurrent
+    conversations: state leaks between them and commands queue behind
+    each other. Create one tool per session, close it (or use ``async
+    with``) when the session ends. If a shared instance is genuinely
+    required, construct with ``mode="stateless"`` so each call spawns a
+    fresh subprocess.
+
+    Args:
+        mode: ``"persistent"`` (default) keeps a single long-lived shell
+            subprocess so ``cd`` / ``export`` carry across calls.
+            ``"stateless"`` spawns a fresh subprocess per call.
+        shell: Optional shell argv override. String values are tokenised.
+            When omitted, the platform default is used (``pwsh`` or
+            ``powershell`` on Windows, ``bash`` or ``sh`` on Unix). May also
+            be overridden via the ``AGENT_FRAMEWORK_SHELL`` env var.
+        workdir: Working directory for commands. Defaults to the current
+            working directory. In persistent mode, each command is
+            re-anchored to this directory when ``confine_workdir=True`` —
+            see that argument for the exact semantics and caveats.
+        confine_workdir: When ``True`` (default), each command in persistent
+            mode is prefixed with a ``cd`` back into ``workdir`` so
+            ``cd``-wandering in one call does not leak to the next. This is
+            a **re-anchor**, not a hard confinement — a command that does
+            ``cd /tmp && rm -rf .`` in one call can still touch ``/tmp``.
+            Use :class:`ShellPolicy` or a sandboxed executor for true
+            confinement.
+        env: Seed environment. In stateless mode this replaces the child's
+            environment unless ``clean_env=False``. In persistent mode the
+            variables are exported before the session is used.
+        clean_env: When ``True``, do **not** inherit ``os.environ``; only
+            the variables supplied in ``env`` are visible to commands.
+        policy: Policy applied before approval. Defaults to an empty
+            :class:`ShellPolicy()` which allows every command; supply
+            explicit ``denylist``/``allowlist`` patterns to filter. The
+            policy is a UX pre-filter, not a security boundary — approval
+            gating + sandbox tier are the real defenses.
+        timeout: Per-command timeout in seconds. ``None`` disables. Default
+            30 s.
+        max_output_bytes: Combined stdout/stderr byte cap before truncation.
+            Default 64 KiB.
+        approval_mode: ``"always_require"`` (default) or ``"never_require"``.
+            Controls the ``FunctionTool.approval_mode`` on the returned
+            function, which the framework uses to gate execution via
+            ``user_input_requests``. **Approval is the actual security
+            boundary of this tool** — disabling it requires
+            ``acknowledge_unsafe=True``.
+        acknowledge_unsafe: Required to be ``True`` if you set
+            ``approval_mode="never_require"``. ``ShellPolicy`` is a UX
+            pre-filter, not a security boundary; without approval the tool
+            will execute any command the model emits.
+        on_command: Optional audit hook called with the command string for
+            every command that passes policy. Use for logging / telemetry.
+    """
+
+    def __init__(
+        self,
+        *,
+        mode: ShellMode = "persistent",
+        shell: str | Sequence[str] | None = None,
+        workdir: str | os.PathLike[str] | None = None,
+        confine_workdir: bool = True,
+        env: Mapping[str, str] | None = None,
+        clean_env: bool = False,
+        policy: ShellPolicy | None = None,
+        timeout: float | None = 30.0,
+        max_output_bytes: int = 64 * 1024,
+        approval_mode: Literal["always_require", "never_require"] = "always_require",
+        acknowledge_unsafe: bool = False,
+        on_command: Callable[[str], None] | None = None,
+    ) -> None:
+        if mode not in ("persistent", "stateless"):
+            raise ValueError(f"mode must be 'persistent' or 'stateless', got {mode!r}")
+        if approval_mode == "never_require" and not acknowledge_unsafe:
+            raise ValueError(
+                "Setting approval_mode='never_require' disables the only built-in "
+                "security boundary of LocalShellTool. If you understand the risk "
+                "(arbitrary commands run on the host with the agent's privileges; "
+                "ShellPolicy is a UX pre-filter, not a defense), pass "
+                "acknowledge_unsafe=True. For untrusted input prefer a "
+                "sandboxed executor (e.g. DockerShellTool or HyperlightCodeActProvider)."
+            )
+        self._mode: ShellMode = mode
+        self._shell_override = shell
+        self._workdir: str | None = os.fspath(workdir) if workdir is not None else None
+        self._confine_workdir = confine_workdir
+        self._policy = policy or ShellPolicy()
+        self._timeout = timeout
+        self._max_output_bytes = max_output_bytes
+        self._approval_mode: Literal["always_require", "never_require"] = approval_mode
+        self._on_command = on_command
+
+        merged_env: dict[str, str] | None
+        if env is None and not clean_env:
+            merged_env = None  # inherit
+        elif clean_env:
+            merged_env = dict(env) if env is not None else {}
+        else:
+            merged_env = {**os.environ, **dict(env or {})}
+        self._env = merged_env
+
+        self._interactive_argv = resolve_shell(self._shell_override, interactive=True)
+        self._stateless_argv = resolve_shell(self._shell_override, interactive=False)
+        self._session: ShellSession | None = None
+        self._session_lock: asyncio.Lock | None = None
+
+    def _get_session_lock(self) -> asyncio.Lock:
+        # Lazily create in the running loop so construction outside a loop is fine.
+        if self._session_lock is None:
+            self._session_lock = asyncio.Lock()
+        return self._session_lock
+
+    # ------------------------------------------------------------------ lifecycle
+
+    async def start(self) -> None:
+        """Eagerly spawn the persistent session (no-op in stateless mode)."""
+        if self._mode != "persistent":
+            return
+        async with self._get_session_lock():
+            if self._session is None:
+                self._session = ShellSession(
+                    self._interactive_argv,
+                    workdir=self._workdir,
+                    env=self._env,
+                    max_output_bytes=self._max_output_bytes,
+                )
+            await self._session.start()
+
+    async def close(self) -> None:
+        """Terminate the persistent session if any."""
+        async with self._get_session_lock():
+            if self._session is not None:
+                try:
+                    await self._session.close()
+                finally:
+                    self._session = None
+
+    async def __aenter__(self) -> LocalShellTool:
+        await self.start()
+        return self
+
+    async def __aexit__(self, *_exc: object) -> None:
+        await self.close()
+
+    # ------------------------------------------------------------------ core run
+
+    async def run(self, command: str, *, timeout: float | None = None) -> ShellResult:
+        """Execute ``command`` directly and return its :class:`ShellResult`.
+
+        Applies policy and the audit hook, but **not** approval (that is
+        handled by the framework when this tool is wrapped via
+        :meth:`as_function`).
+
+        Args:
+            command: The shell command to execute.
+            timeout: Optional per-call timeout in seconds that overrides
+                the tool's configured default. When ``None``, the tool's
+                ``timeout`` setting is used. The timeout is enforced
+                inside the executor (the subprocess is killed / the
+                persistent session tears down the command on timeout)
+                so callers do not need to wrap this call in
+                :func:`asyncio.wait_for`.
+        """
+        request = ShellRequest(command=command, workdir=self._workdir)
+        decision = self._policy.evaluate(request)
+        if decision.decision == "deny":
+            raise ShellCommandError(f"Command rejected by policy: {decision.reason}")
+        if self._on_command is not None:
+            try:
+                self._on_command(command)
+            except Exception:
+                logger.exception("on_command hook raised")
+
+        effective_timeout = self._timeout if timeout is None else timeout
+
+        if self._mode == "persistent":
+            if self._session is None:
+                await self.start()
+            if self._session is None:
+                raise RuntimeError("LocalShellTool session failed to start")
+            effective = self._maybe_reanchor(command)
+            return await self._session.run(effective, timeout=effective_timeout)
+
+        return await run_stateless(
+            self._stateless_argv,
+            command,
+            workdir=self._workdir,
+            env=self._env,
+            timeout=effective_timeout,
+            max_output_bytes=self._max_output_bytes,
+        )
+
+    # ------------------------------------------------------------------ AF wiring
+
+    def as_function(
+        self,
+        *,
+        name: str = "run_shell",
+        description: str | None = None,
+    ) -> FunctionTool:
+        """Return an :class:`~agent_framework.FunctionTool` bound to this instance.
+
+        The returned tool has ``kind="shell"`` so provider-specific
+        ``get_shell_tool(func=...)`` factories recognise it as a local shell.
+        """
+
+        async def _run_shell(command: str) -> str:
+            try:
+                result = await self.run(command)
+            except ShellCommandError as exc:
+                return str(exc)
+            return result.format_for_model()
+
+        effective_description = description or _default_description(self._mode)
+        _run_shell.__doc__ = effective_description
+        return tool(
+            func=_run_shell,
+            name=name,
+            description=effective_description,
+            approval_mode=self._approval_mode,
+            kind=SHELL_TOOL_KIND_VALUE,
+        )
+
+    # ------------------------------------------------------------------ helpers
+
+    def _maybe_reanchor(self, command: str) -> str:
+        """Prefix ``cd`` when confinement is enabled and workdir is set."""
+        if not self._confine_workdir or self._workdir is None:
+            return command
+        # Idempotent prefix: cd back before each command so a `cd` in one
+        # call does not leak workdir state to the next.
+        if self._interactive_argv and is_powershell(self._interactive_argv):
+            return f"Set-Location -LiteralPath {_quote_powershell(self._workdir)}\n{command}"
+        return f"cd -- {_quote_posix(self._workdir)}\n{command}"
@@ -0,0 +1,41 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Head/tail UTF-8 byte truncation shared by the local and Docker shell tools."""
+
+from __future__ import annotations
+
+
+def truncate_head_tail(data: bytes, cap: int) -> tuple[str, bool]:
+    """Truncate ``data`` to ``cap`` bytes, keeping a head and a tail slice.
+
+    Distributes the budget so head receives ``cap // 2`` bytes and tail
+    receives ``cap - cap // 2`` bytes — for an odd ``cap`` the tail keeps
+    the extra byte so no input bytes are silently dropped on the boundary.
+
+    Returns the joined (head, marker, tail) string and a boolean indicating
+    whether truncation occurred.
+
+    Raises ``ValueError`` if ``cap`` is not positive — a non-positive
+    cap has no consistent meaning here and silently treating it as
+    "unlimited" would defeat output-capping assumptions in callers.
+    """
+    if cap <= 0:
+        raise ValueError(f"cap must be positive; got {cap}")
+    if len(data) <= cap:
+        return data.decode("utf-8", errors="replace"), False
+    head_cap = cap // 2
+    tail_cap = cap - head_cap
+    head = data[:head_cap].decode("utf-8", errors="replace")
+    tail = data[len(data) - tail_cap :].decode("utf-8", errors="replace")
+    dropped = len(data) - cap
+    return f"{head}\n[... truncated {dropped} bytes ...]\n{tail}", True
+
+
+def truncate_text_head_tail(text: str, cap: int) -> tuple[str, bool]:
+    """``truncate_head_tail`` for already-decoded text.
+
+    Encodes ``text`` as UTF-8, applies the byte-budgeted head/tail split,
+    and returns a string. UTF-8 decode with ``errors="replace"`` ensures
+    truncation that lands mid-codepoint cannot raise.
+    """
+    return truncate_head_tail(text.encode("utf-8"), cap)
@@ -0,0 +1,59 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Shared types for the local shell tool."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Literal
+
+ShellMode = Literal["persistent", "stateless"]
+
+
+@dataclass(frozen=True)
+class ShellResult:
+    """The outcome of a single shell command invocation.
+
+    Attributes:
+        stdout: Captured standard output, possibly truncated.
+        stderr: Captured standard error, possibly truncated.
+        exit_code: The exit status reported by the shell or subprocess.
+        duration_ms: How long the command took, in milliseconds.
+        truncated: ``True`` when stdout or stderr was truncated to fit
+            ``max_output_bytes``.
+        timed_out: ``True`` when the command was killed because it exceeded
+            the configured timeout.
+    """
+
+    stdout: str
+    stderr: str
+    exit_code: int
+    duration_ms: int
+    truncated: bool = False
+    timed_out: bool = False
+
+    def format_for_model(self) -> str:
+        """Format the result as a single text block suitable for an LLM."""
+        parts: list[str] = []
+        if self.stdout:
+            parts.append(self.stdout)
+        if self.stderr:
+            parts.append(f"stderr: {self.stderr}")
+        if self.truncated:
+            parts.append("[output truncated]")
+        if self.timed_out:
+            parts.append("[command timed out]")
+        parts.append(f"exit_code: {self.exit_code}")
+        return "\n".join(parts)
+
+
+class ShellExecutionError(RuntimeError):
+    """Base class for shell-tool execution failures."""
+
+
+class ShellTimeoutError(ShellExecutionError):
+    """Raised when a command exceeds the configured timeout."""
+
+
+class ShellCommandError(ShellExecutionError):
+    """Raised when a command is rejected by the configured policy."""