Python: Add AgentLoopMiddleware for re-running agents in a loop (#6174)

* Python: Add AgentLoopMiddleware for re-running agents in a loop Add `AgentLoopMiddleware`, an `AgentMiddleware` that re-runs the wrapped agent in a loop. A single configurable class covers three common patterns, each with a convenience classmethod factory: - Ralph loop (`.ralph(...)`): no exit criteria, with feedback tracking (`record_feedback`/`progress`), progress injection (`inject_progress`), optional fresh context per iteration (`fresh_context`), and an early-stop completion signal (`is_complete`). - Predicate (`.with_predicate(...)`): loop while a `should_continue` callable returns True (e.g. paired with `todos_remaining`/`background_tasks_running`). - Judge (`.with_judge(...)`): a second chat client decides whether the original request was answered, using a `JudgeVerdict` structured-output response. The loop also auto-resolves pending function-approval / user-input requests via an `on_approval_request` callable (bounded by `max_approval_rounds`), and the next iteration's input is controlled by `next_message`. Supports both streaming and non-streaming runs. Exports `AgentLoopMiddleware`, `JudgeVerdict`, `todos_remaining`, and `background_tasks_running`. Adds tests, a sample, and docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Refine AgentLoopMiddleware API and sample - with_judge: add criteria list with {{criteria}} templating into judge instructions plus an agent-side instruction; add fresh_context, additional judge feedback relay; default judge max_iterations. - should_continue is now required and positional; supports (bool, str|None) feedback tuples surfaced to next_message/record_feedback via feedback kwarg. - Judge forwards full multi-modal request and response messages. - Default max_iterations=10 (explicit None = unbounded); removed is_complete and Ralph terminology; ShouldContinueResult is a real TypeAlias. - Sample: stream all loops, print iteration counts via injected user-block boundaries (robust to function calling), <role>: content formatting, per-method expected output, and a looping todo sample. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Fix CI checks for AgentLoopMiddleware - Resolve pyright errors in _loop.py: drop the always-true final_result None check (the while loop always assigns it) and cast finish_reason to the AgentResponse constructor's expected type. - Apply pyupgrade --py310-plus: import TypeAlias from typing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Resolve mypy/pyright disagreement on finish_reason pyright infers AgentResponse.finish_reason as including str and rejects the direct assignment, while mypy considers a cast redundant. Drop the cast and suppress only pyright with a targeted reportArgumentType ignore, satisfying both type checkers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Add todo+judge AgentLoopMiddleware sample Add a second AgentLoopMiddleware sample that composes two criteria in one should_continue predicate: a TodoProvider check (evaluated first) and a report-style judge chat client (evaluated once todos are complete) that grades the assembled report against shared requirements. Register it in the middleware samples README. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Compose todo+judge loops as two middleware Rework the todo+judge sample to compose two AgentLoopMiddleware on the agent itself (middleware=[judge_loop, todo_loop]) instead of a single hand-written predicate. The inner todos_remaining loop drafts the report todo-by-todo and the outer with_judge loop re-runs it until an editor chat client judges the report publication-ready, reusing the built-in helpers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Reset session for fresh_context loops via snapshot/restore AgentLoopMiddleware.fresh_context previously only reset context.messages, so with an attached session each iteration still reloaded the local transcript or re-threaded the service-side conversation id and the model saw the accumulated history. Snapshot the session once before the loop (via to_dict) and restore it (from_dict + field copy) between iterations, so every pass starts from the pre-loop baseline. The final iteration's pass is persisted (no restore after the terminating iteration), so a subsequent agent.run continues from there. Removed the obsolete warning, updated docstrings and core AGENTS.md, and added tests: a snapshot/restore round-trip, a session-reset streaming x fresh_context x inject_progress x store matrix across multiple runs and loop iterations, and response_format parsing across the loop. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Updated samples and docstrings --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-16 21:04:09 +08:00 · 2026-06-12 16:35:54 +02:00
parent 3f77c555cf
commit 1acd242550
9 changed files with 2519 additions and 0 deletions
@@ -116,6 +116,11 @@ agent_framework/
  available, approval requests for known non-approval-required tools are treated as already approved, hidden, stored
  in session state keyed to the visible approval request ids from that batch, and reinjected only when that visible
  approval flow resumes.
+### Agent Loop (`_harness/_loop.py`)
+
+- **`AgentLoopMiddleware`** - `AgentMiddleware` that re-runs an agent in a loop by calling `call_next()` repeatedly (the pipeline re-reads `context.messages` each time). One configurable class covers two patterns: a required user `should_continue` predicate (sync or async, the first positional/keyword arg), and a chat-client judge built via the `.with_judge(...)` factory (a second chat client decides whether the original request was answered; loops while it is *not*, using a `JudgeVerdict` structured-output response — internally just an async `should_continue` predicate). The constructor covers the predicate pattern directly; only the judge has a convenience classmethod factory (`.with_judge(judge_client, ...)`) that forwards to `__init__`. Supports both streaming and non-streaming runs. By default a non-streaming run returns an aggregated `AgentResponse` containing every iteration's messages plus the injected `next_message` "nudge" messages (as `user` messages); set `return_final_only=True` to return only the last iteration's response. Streaming runs always yield each iteration's updates and emit the injected nudge messages as `user` updates between iterations (the `return_final_only` flag has no effect on streaming, and the final response reflects the last iteration; `MiddlewareTermination` is handled cleanly). `should_continue` is required; other constructor args are optional: `max_iterations` (safety cap; defaults to `DEFAULT_MAX_ITERATIONS`=10, explicit `None`→unbounded, positive int caps; `.with_judge` uses `DEFAULT_JUDGE_MAX_ITERATIONS`=5 as its default), `next_message` (defaults to a short "continue" nudge), `return_final_only`, and `additional_instructions` (an extra `system` message injected ahead of the input before the agent runs — becomes part of the original messages so it survives `fresh_context` resets and persists via a session). The judge is configured only through `.with_judge` (`judge_client`/`instructions`/`criteria`), not the constructor, and its `reasoning` is fed back to the agent as the next iteration's input; the judge forwards the original request messages and the agent's latest response messages verbatim so multi-modal content is preserved. `criteria` (a `list[str]`) is both injected as the agent's `additional_instructions` and rendered into the judge instructions wherever the `{{criteria}}` placeholder (`CRITERIA_PLACEHOLDER`) appears (`DEFAULT_JUDGE_INSTRUCTIONS` ends with it; custom `instructions` may include it, and it is stripped when no criteria are given). The `should_continue`/`next_message` callables are invoked with keyword args (`iteration`, `last_result`, `messages`, `original_messages`, `session`, `agent`, `progress`, `feedback`) and may be sync or async; declare only what you need plus `**kwargs`. `should_continue` may return a plain `bool` or a `(bool, str | None)` tuple whose second item is feedback surfaced to `next_message`/`record_feedback` via the `feedback` kwarg (the judge uses this to relay its `reasoning`). Stop precedence per iteration is `max_iterations` → `should_continue`, evaluated before `record_feedback` so the feedback is available to it.
+  - **Feedback tracking** - `record_feedback` captures a per-iteration progress entry (called with the loop kwargs; if it returns a truthy string the entry is appended, otherwise the agent's response text is used as the fallback entry). The accumulated log is exposed to every callback via the `progress` keyword (a per-iteration copy of prior entries) and, when `inject_progress=True` (default), injected into the next iteration's input as a `user` message (the full log without a session, only the latest entry with a session to avoid duplicating history). `fresh_context=True` restarts each iteration from the original task plus the progress log; when a session is attached it is snapshotted (`to_dict()`) before the loop and restored (`from_dict` + field copy) between iterations so the local transcript and any service-side conversation id reset too (in-loop working-state is discarded, pre-loop state preserved, continuity carried only by the progress log).
+- **`todos_remaining(provider)`** / **`background_tasks_running(provider)`** - Helper factories returning `should_continue` predicates that loop while a `TodoProvider` has open items, or while a `BackgroundAgentsProvider`'s persisted state shows running tasks.

 ### Workflows (`_workflows/`)

@@ -102,6 +102,12 @@ from ._harness._file_access import (
    FileSystemAgentFileStore,
    InMemoryAgentFileStore,
 )
+from ._harness._loop import (
+    AgentLoopMiddleware,
+    JudgeVerdict,
+    background_tasks_running,
+    todos_remaining,
+)
 from ._harness._memory import (
    DEFAULT_MEMORY_SOURCE_ID,
    MemoryContextProvider,
@@ -363,6 +369,7 @@ __all__ = [
    "AgentExecutorResponse",
    "AgentFileStore",
    "AgentFrameworkException",
+    "AgentLoopMiddleware",
    "AgentMiddleware",
    "AgentMiddlewareLayer",
    "AgentMiddlewareTypes",
@@ -454,6 +461,7 @@ __all__ = [
    "InlineSkill",
    "InlineSkillResource",
    "InlineSkillScript",
+    "JudgeVerdict",
    "LocalEvaluator",
    "MCPSkill",
    "MCPSkillResource",
@@ -558,6 +566,7 @@ __all__ = [
    "agent_middleware",
    "annotate_message_groups",
    "apply_compaction",
+    "background_tasks_running",
    "chat_middleware",
    "create_always_approve_tool_response",
    "create_always_approve_tool_with_arguments_response",
@@ -588,6 +597,7 @@ __all__ = [
    "response_handler",
    "set_agent_mode",
    "step",
+    "todos_remaining",
    "tool",
    "tool_call_args_match",
    "tool_called_check",
@@ -0,0 +1,796 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""AgentLoopMiddleware: re-run an agent in a loop until a criterion is met.
+
+This module provides :class:`AgentLoopMiddleware`, an :class:`~agent_framework.AgentMiddleware`
+that repeatedly re-invokes the wrapped agent while a ``should_continue`` predicate says to keep
+going. It serves two common patterns through a single configurable class:
+
+1. A user-supplied ``should_continue`` predicate - for example, keep looping while a response does
+   not yet contain a completion marker, while a :class:`~agent_framework.TodoProvider` still has
+   open items, or while a :class:`~agent_framework.BackgroundAgentsProvider` still has running
+   tasks (see the :func:`todos_remaining` and :func:`background_tasks_running` helpers). The loop
+   can track a **feedback log** across iterations (``record_feedback``): each pass contributes an
+   entry that is exposed to every callback via the ``progress`` keyword and (by default) injected
+   into the next iteration's input. Set ``fresh_context=True`` to restart each pass from the
+   original task plus the progress log (with a session attached, the session is also snapshotted
+   before the loop and restored between iterations so no accumulated history leaks back in).
+   ``max_iterations`` bounds the loop as a safety cap.
+2. A chat-client judge (via :meth:`AgentLoopMiddleware.with_judge`) - a second chat client decides
+   whether the user's original request has been answered (via a :class:`JudgeVerdict` structured
+   output); the loop continues while the answer is "no". This is a convenience wrapper that builds an
+   async ``should_continue`` predicate, so it is a special case of (1).
+
+In every case, the input for the next iteration is controlled by the ``next_message`` callable.
+"""
+
+from __future__ import annotations
+
+import inspect
+from collections.abc import Awaitable, Callable, Sequence
+from typing import TYPE_CHECKING, Any, TypeAlias
+
+from pydantic import BaseModel, Field
+from typing_extensions import Self
+
+from .._feature_stage import ExperimentalFeature, experimental
+from .._middleware import AgentContext, AgentMiddleware, MiddlewareTermination
+from .._types import (
+    AgentResponse,
+    AgentResponseUpdate,
+    AgentRunInputs,
+    Message,
+    ResponseStream,
+    UsageDetails,
+    add_usage_details,
+    normalize_messages,
+)
+
+if TYPE_CHECKING:
+    from .._clients import SupportsChatGetResponse
+
+__all__ = [
+    "AgentLoopMiddleware",
+    "JudgeVerdict",
+    "background_tasks_running",
+    "todos_remaining",
+]
+
+DEFAULT_NEXT_MESSAGE = "Continue working on the task. If it is complete, say so."
+
+# Placeholder substituted with the rendered ``criteria`` block in judge instructions (see
+# :meth:`AgentLoopMiddleware.with_judge`). User-supplied instructions may include it to control
+# where the criteria are inserted; if absent, the criteria are not added to the judge instructions.
+CRITERIA_PLACEHOLDER = "{{criteria}}"
+
+# Verdict markers the judge is asked to emit for clients that do not honor structured output. They
+# are deliberately non-overlapping: neither marker is a substring of the other, nor of the JSON
+# field name ``answered``, so the text fallback in :func:`_build_judge_condition` cannot misclassify
+# a negative verdict (e.g. ``{"answered": false}``) as a positive one.
+JUDGE_VERDICT_DONE = "VERDICT: DONE"
+JUDGE_VERDICT_MORE = "VERDICT: MORE"
+
+DEFAULT_JUDGE_INSTRUCTIONS = (
+    "You are an evaluator. You are given a user's original request and an agent's latest response. "
+    "Decide whether the agent has fully addressed the original request. "
+    "Set 'answered' to true if the request has been fully addressed, or false if more work is still "
+    "required, and use 'reasoning' to briefly justify your decision. "
+    f"If you cannot return structured output, end your reply with a line reading exactly "
+    f"'{JUDGE_VERDICT_DONE}' when the request has been fully addressed or '{JUDGE_VERDICT_MORE}' "
+    f"when more work is still required."
+    "{{criteria}}"
+)
+
+
+def _render_criteria_block(criteria: Sequence[str] | None) -> str:
+    """Render a list of criteria into a bullet block for the judge instructions (``""`` if none)."""
+    if not criteria:
+        return ""
+    bullets = "\n".join(f"- {item}" for item in criteria)
+    return f"\n\nThe response must satisfy all of the following criteria:\n{bullets}"
+
+
+def _criteria_agent_instruction(criteria: Sequence[str]) -> str:
+    """Render the criteria into an extra instruction injected for the agent before each run."""
+    bullets = "\n".join(f"- {item}" for item in criteria)
+    return f"Your response must satisfy all of the following criteria:\n{bullets}"
+
+
+class JudgeVerdict(BaseModel):
+    """Structured verdict returned by the judge chat client."""
+
+    answered: bool = Field(
+        description=(
+            "True if the agent has fully addressed the original request and it adheres to the other "
+            "judging standards, otherwise False."
+        ),
+    )
+    reasoning: str = Field(
+        default="",
+        description="Brief justification for the verdict.",
+    )
+
+
+# Default iteration cap applied when ``max_iterations`` is not provided. Loops are bounded by
+# default to guard against runaway re-invocation; pass ``max_iterations=None`` explicitly to opt
+# into an unbounded loop.
+DEFAULT_MAX_ITERATIONS = 10
+
+# Default iteration cap for judge-driven loops. LLM-judged loops are costly and probabilistic, so
+# they are bounded by a smaller default. Pass ``max_iterations=None`` explicitly to opt into an
+# unbounded judge loop.
+DEFAULT_JUDGE_MAX_ITERATIONS = 5
+
+
+# A callable invoked between iterations. It always receives the loop keyword arguments
+# (``iteration``, ``last_result``, ``messages``, ``original_messages``, ``session``, ``agent``,
+# ``progress``, ``feedback``). Callers declare only the keywords they need plus ``**kwargs`` to
+# ignore the rest. ``should_continue`` may return a plain ``bool`` (continue/stop) or a
+# ``(bool, str | None)`` tuple whose second item is feedback surfaced to the ``next_message`` and
+# ``record_feedback`` callables via the ``feedback`` keyword argument.
+ShouldContinueResult: TypeAlias = "bool | tuple[bool, str | None]"
+ShouldContinueCallable = Callable[..., "ShouldContinueResult | Awaitable[ShouldContinueResult]"]
+NextMessageCallable = Callable[..., "AgentRunInputs | Awaitable[AgentRunInputs | None] | None"]
+
+# A callable invoked once per work iteration to capture a progress-log entry from that iteration. It
+# receives the loop keyword arguments and returns a string entry (appended to the log) or ``None``
+# (record nothing for that iteration).
+FeedbackCallable = Callable[..., "str | Awaitable[str | None] | None"]
+
+
+async def _maybe_await(value: Any) -> Any:
+    """Await ``value`` if it is awaitable, otherwise return it as-is."""
+    if inspect.isawaitable(value):
+        return await value
+    return value
+
+
+def _build_judge_condition(
+    judge_client: SupportsChatGetResponse,
+    instructions: str,
+) -> tuple[ShouldContinueCallable, NextMessageCallable]:
+    """Build the ``should_continue`` predicate and ``next_message`` callable for a judge loop.
+
+    The judge is called directly (no agent tools, session, or middleware) with fresh messages, so
+    the loop's evaluation cannot recurse back through the agent pipeline. The original input messages
+    are forwarded verbatim (rather than collapsed to text) so multi-modal requests are preserved. The
+    judge is asked for a :class:`JudgeVerdict` structured output; if the client does not honor
+    structured output the verdict falls back to the explicit, non-overlapping ``VERDICT: DONE`` /
+    ``VERDICT: MORE`` markers (``MORE`` wins, keeping the loop running, when the marker is ambiguous
+    or absent).
+
+    The predicate returns a ``(continue, reasoning)`` tuple; the loop surfaces that ``reasoning`` to
+    the next-message callable as the ``feedback`` keyword argument, which feeds it back to the agent
+    so it knows *why* its previous answer was judged incomplete.
+    """
+
+    async def _judge(
+        *, last_result: AgentResponse, original_messages: list[Message], **kwargs: Any
+    ) -> tuple[bool, str | None]:
+        judge_messages = [
+            Message(role="system", contents=[instructions]),
+            Message(
+                role="user",
+                contents=["Evaluate the agent's work. The user's original request follows:"],
+            ),
+            *original_messages,
+            Message(role="user", contents=["The agent's latest response was:"]),
+            *last_result.messages,
+            Message(role="user", contents=["Has the original request been fully addressed?"]),
+        ]
+        response = await judge_client.get_response(judge_messages, options={"response_format": JudgeVerdict})
+        verdict = response.value
+        if isinstance(verdict, JudgeVerdict):
+            answered = verdict.answered
+            reasoning = verdict.reasoning
+        else:
+            # Fallback for clients that do not honor structured output: look for the explicit,
+            # non-overlapping verdict markers. ``FAIL`` (more work needed) takes precedence so an
+            # ambiguous or marker-less reply keeps looping rather than stopping on an incomplete
+            # answer.
+            text = response.text.upper()
+            # ``MORE`` (more work needed) takes precedence so an ambiguous reply keeps looping.
+            answered = False if JUDGE_VERDICT_MORE in text else JUDGE_VERDICT_DONE in text
+            reasoning = response.text.strip()
+        # Continue looping while the request is not yet answered, surfacing the reasoning as feedback.
+        return (not answered), (reasoning or None)
+
+    def _next_message(*, feedback: str | None = None, **kwargs: Any) -> AgentRunInputs:
+        # Feed the judge's reasoning back to the agent so the next iteration addresses the gap.
+        if feedback:
+            return (
+                "An evaluator reviewed your previous response and judged that it does not yet fully "
+                f"address the original request.\n\nEvaluator feedback: {feedback}\n\n"
+                "Revise and continue so the original request is fully addressed."
+            )
+        return DEFAULT_NEXT_MESSAGE
+
+    return _judge, _next_message
+
+
+@experimental(feature_id=ExperimentalFeature.HARNESS)
+class AgentLoopMiddleware(AgentMiddleware):
+    """Re-run an agent in a loop until a criterion is met (or never).
+
+    This middleware repeatedly invokes the wrapped agent. After each run it decides whether to run
+    again based on ``should_continue`` and ``max_iterations``, and uses ``next_message`` to build
+    the input for the next iteration. Use :meth:`with_judge` to drive the loop with a chat-client
+    judge instead of a hand-written predicate.
+
+    By default a non-streaming run returns an aggregated :class:`~agent_framework.AgentResponse`
+    containing every iteration's messages plus the injected ``next_message`` "nudge" messages (set
+    ``return_final_only=True`` to return only the last iteration's response). Streaming runs always
+    yield each iteration's updates and emit the injected nudge messages as ``user`` updates between
+    iterations.
+
+    The ``should_continue`` and ``next_message`` callables are invoked with keyword arguments, so a
+    caller only needs to declare the ones it uses plus ``**kwargs``. The keywords are:
+
+    - ``iteration`` (int): the number of completed runs so far (1-based after the first run).
+    - ``last_result`` (AgentResponse): the result of the iteration that just completed.
+    - ``messages`` (list[Message]): the messages used for the iteration that just completed.
+    - ``original_messages`` (list[Message]): the input used for the first iteration.
+    - ``session`` (AgentSession | None): the active session, used by the provider helpers.
+    - ``agent``: the agent being looped.
+    - ``progress`` (list[str]): the feedback log accumulated so far (see ``record_feedback``).
+    - ``feedback`` (str | None): the feedback string returned by ``should_continue`` for this
+      iteration (``None`` when it returned a plain bool). ``should_continue`` may return either a
+      ``bool`` or a ``(bool, str | None)`` tuple; the string is surfaced here so ``next_message``
+      and ``record_feedback`` can reference it.
+
+    Examples:
+        .. code-block:: python
+
+            from agent_framework import Agent, AgentResponse
+            from agent_framework._harness._loop import AgentLoopMiddleware
+
+
+            async def should_continue(*, iteration: int, last_result: AgentResponse, **kwargs) -> bool:
+                return iteration < 3 and "DONE" not in last_result.text
+
+
+            agent = Agent(client=client, middleware=[AgentLoopMiddleware(should_continue)])
+
+    Note:
+        ``max_iterations`` acts as a safety cap and defaults to ``DEFAULT_MAX_ITERATIONS`` (10). Pass
+        an explicit ``None`` to make the loop unbounded, in which case it relies entirely on
+        ``should_continue`` to stop, so make sure the predicate can eventually return ``False``.
+    """
+
+    def __init__(
+        self,
+        should_continue: ShouldContinueCallable,
+        *,
+        max_iterations: int | None = DEFAULT_MAX_ITERATIONS,
+        next_message: NextMessageCallable | None = None,
+        record_feedback: FeedbackCallable | None = None,
+        inject_progress: bool = True,
+        fresh_context: bool = False,
+        return_final_only: bool = False,
+        additional_instructions: str | None = None,
+    ) -> None:
+        """Initialize the agent loop middleware.
+
+        Args:
+            should_continue: Predicate that decides whether to run the agent again. May be sync or
+                async and is called with the loop keyword arguments (``iteration``, ``last_result``,
+                ``messages``, ``original_messages``, ``session``, ``agent``, ``progress``, and
+                ``feedback`` -- see the class docstring for what each one carries; declare only the
+                ones you need plus ``**kwargs``). Return ``True``/``False`` to
+                continue/stop, or a ``(bool, str | None)`` tuple to also provide feedback; the
+                feedback string is surfaced to the ``next_message`` and ``record_feedback`` callables
+                via the ``feedback`` keyword argument. To loop on a chat-client judge instead, build
+                the middleware via :meth:`with_judge`.
+
+        Keyword Args:
+            max_iterations: Maximum number of agent runs, used as a safety cap. Defaults to
+                ``DEFAULT_MAX_ITERATIONS`` (10); pass an explicit ``None`` for an unbounded loop, or
+                a positive integer to set a custom cap. (The :meth:`with_judge` factory uses
+                ``DEFAULT_JUDGE_MAX_ITERATIONS`` (5) as its default instead.)
+            next_message: Callable that produces the input for the next iteration, called with the
+                loop keyword arguments. Defaults to a short "continue" nudge. Returning ``None``
+                reuses the previous iteration's messages verbatim (in which case the progress log is
+                *not* injected; see ``inject_progress``).
+            record_feedback: Optional callable invoked once per work iteration to capture a feedback
+                entry. Called as ``record_feedback(**loop_kwargs)`` and returns a
+                string entry appended to the progress log, or ``None`` to record nothing for that
+                iteration. When not provided, the iteration's response text (``last_result.text``) is
+                recorded instead.                 The accumulated log is exposed to every callback via the
+                ``progress`` loop keyword argument. For production loops prefer a ``record_feedback``
+                that returns a terse summary rather than relying on the full response text.
+            inject_progress: When ``True`` (default), the accumulated progress log is injected into
+                the next iteration's input as a single ``user`` message ("Progress so far: ..."). To
+                avoid duplication, only the most recent entry is injected when a session is attached
+                (the session already retains earlier turns); the full log is injected when there is
+                no session or ``fresh_context`` is set. When ``False`` the log is only exposed via the
+                ``progress`` loop keyword argument and never injected automatically.
+            fresh_context: When ``True``, each iteration starts from a clean context: ``context``
+                messages are reset to the original input messages (plus the injected progress log)
+                instead of accumulating the prior conversation. When a session is attached, the
+                session is snapshotted once before the loop and restored to that pre-loop baseline
+                before each subsequent iteration, so the local transcript and any service-side
+                conversation id are reset too and the agent does not re-read the accumulated history.
+                In-loop working-state mutations are discarded; pre-loop state is preserved; continuity
+                is carried only by the progress log.
+            return_final_only: Controls what a non-streaming run returns. When ``False`` (default),
+                the returned :class:`~agent_framework.AgentResponse` aggregates every iteration: each
+                iteration's response messages plus the injected ``next_message`` "nudge" messages
+                (as ``user`` messages), so the caller sees the full back-and-forth. When ``True``,
+                only the final iteration's :class:`~agent_framework.AgentResponse` is returned. This
+                flag has no effect on streaming runs (the stream cannot know in advance which
+                iteration is last); streaming always yields each iteration's updates and injects the
+                ``next_message`` messages as ``user`` updates between iterations.
+            additional_instructions: Optional extra instruction injected as a ``system`` message
+                ahead of the input messages before the agent runs. It becomes part of the original
+                messages, so it is preserved across ``fresh_context`` resets and (with a session)
+                persists server-side across iterations. Used by :meth:`with_judge` to tell the agent
+                about the criteria its response must satisfy, but available to any loop.
+
+        Raises:
+            ValueError: If ``max_iterations`` is not ``None`` and is less than 1.
+        """
+        if max_iterations is not None and max_iterations < 1:
+            raise ValueError("max_iterations must be None or a positive integer (>= 1).")
+
+        self.max_iterations: int | None = max_iterations
+        self.should_continue: ShouldContinueCallable = should_continue
+        self.next_message = next_message
+        self.record_feedback = record_feedback
+        self.inject_progress = inject_progress
+        self.fresh_context = fresh_context
+        self.return_final_only = return_final_only
+        self.additional_instructions = additional_instructions
+
+    @classmethod
+    def with_judge(
+        cls,
+        judge_client: SupportsChatGetResponse,
+        *,
+        criteria: Sequence[str] | None = None,
+        instructions: str | None = None,
+        max_iterations: int | None = DEFAULT_JUDGE_MAX_ITERATIONS,
+        next_message: NextMessageCallable | None = None,
+        fresh_context: bool = False,
+    ) -> Self:
+        """Create a loop that continues until a judge chat client decides the request was answered.
+
+        Convenience factory for the judge pattern: ``judge_client`` is queried with a
+        :class:`JudgeVerdict` structured-output response after each iteration and the loop continues
+        while the request is *not* answered. The judge's ``reasoning`` is fed back to the agent as
+        the next iteration's input (unless a custom ``next_message`` is provided), so the agent knows
+        why its previous answer was judged incomplete. See :meth:`__init__` for the full meaning of
+        each argument.
+
+        Args:
+            judge_client: Chat client used to judge whether the original request was answered.
+
+        Keyword Args:
+            criteria: Optional list of criteria the response must satisfy. When provided, they are
+                (1) injected as an extra ``system`` instruction for the agent before it runs (via
+                ``additional_instructions``) and (2) rendered into the judge instructions wherever
+                the ``{{criteria}}`` placeholder appears (``CRITERIA_PLACEHOLDER``).
+            instructions: Optional system instructions for the judge. Defaults to
+                ``DEFAULT_JUDGE_INSTRUCTIONS``. May contain the ``{{criteria}}`` placeholder, which
+                is replaced with the rendered ``criteria`` (or removed when no criteria are given).
+            max_iterations: Maximum number of agent runs. Defaults to
+                ``DEFAULT_JUDGE_MAX_ITERATIONS`` (5); pass ``None`` for unbounded, or a positive
+                integer to set a custom cap.
+            next_message: Callable that produces the next iteration's input. Defaults to one that
+                relays the judge's ``reasoning`` back to the agent.
+            fresh_context: When ``True``, each iteration restarts from the original input messages
+                (plus the injected progress log and judge feedback) instead of accumulating the prior
+                conversation; an attached session is snapshotted before the loop and restored to that
+                baseline between iterations. See :meth:`__init__` for the full semantics. Defaults to
+                ``False``.
+        """
+        judge_instructions = (instructions or DEFAULT_JUDGE_INSTRUCTIONS).replace(
+            CRITERIA_PLACEHOLDER, _render_criteria_block(criteria)
+        )
+        should_continue, judge_next_message = _build_judge_condition(judge_client, judge_instructions)
+        return cls(
+            should_continue=should_continue,
+            max_iterations=max_iterations,
+            next_message=next_message or judge_next_message,
+            fresh_context=fresh_context,
+            additional_instructions=_criteria_agent_instruction(criteria) if criteria else None,
+        )
+
+    async def process(
+        self,
+        context: AgentContext,
+        call_next: Callable[[], Awaitable[None]],
+    ) -> None:
+        """Run the wrapped agent in a loop."""
+        if self.additional_instructions is not None:
+            # Inject the extra instruction as a system message ahead of the input so it is present
+            # on every iteration and preserved across fresh_context resets (which restart from
+            # ``original_messages``).
+            context.messages = [
+                Message(role="system", contents=[self.additional_instructions]),
+                *context.messages,
+            ]
+        original_messages = list(context.messages)
+        # For a truly fresh context per iteration the session must also be reset, otherwise the
+        # next run reloads the local transcript or re-threads the service-side conversation and the
+        # model still sees the accumulated history. Snapshot the session once here (the pre-loop
+        # baseline) and restore it before each subsequent iteration so every pass starts clean.
+        snapshot = context.session.to_dict() if self.fresh_context and context.session is not None else None
+        if context.stream:
+            self._process_streaming(context, call_next, original_messages, snapshot)
+        else:
+            await self._process_non_streaming(context, call_next, original_messages, snapshot)
+
+    @staticmethod
+    def _restore_session(session: Any, snapshot: dict[str, Any]) -> None:
+        """Restore a session in place to a previously captured ``to_dict()`` snapshot.
+
+        Re-hydrates the snapshot via :meth:`AgentSession.from_dict` and copies the mutable fields
+        (``service_session_id`` and ``state``) back onto the live ``session`` instance, so any
+        reference held by the agent/context observes the reset. ``session_id`` is preserved (the
+        snapshot carries the same id). A fresh ``from_dict`` is built on every call so repeated
+        restores from one snapshot do not alias the same state dict.
+        """
+        from .._sessions import AgentSession
+
+        restored = AgentSession.from_dict(snapshot)
+        session.service_session_id = restored.service_session_id
+        session.state = restored.state
+
+    async def _process_non_streaming(
+        self,
+        context: AgentContext,
+        call_next: Callable[[], Awaitable[None]],
+        original_messages: list[Message],
+        snapshot: dict[str, Any] | None,
+    ) -> None:
+        iteration = 0
+        work_iterations = 0
+        progress: list[str] = []
+        # Aggregated transcript across iterations: each iteration's response messages plus the
+        # injected "nudge" messages, used to build the combined response when return_final_only=False.
+        aggregated: list[Message] = []
+        aggregated_usage: UsageDetails | None = None
+        final_result: AgentResponse | None = None
+        while True:
+            await call_next()
+            iteration += 1
+
+            result = context.result
+            if not isinstance(result, AgentResponse):
+                raise TypeError(
+                    "AgentLoopMiddleware expected an AgentResponse from a non-streaming run, "
+                    f"got {type(result).__name__}."
+                )
+
+            final_result = result
+            aggregated.extend(result.messages)
+            if result.usage_details is not None:
+                aggregated_usage = add_usage_details(aggregated_usage, result.usage_details)
+
+            messages_used = context.messages
+            loop_kwargs = self._build_loop_kwargs(
+                context=context,
+                iteration=iteration,
+                last_result=result,
+                messages_used=messages_used,
+                original_messages=original_messages,
+                progress=progress,
+            )
+
+            work_iterations += 1
+            # Decide whether to stop and capture any feedback from should_continue first, so the
+            # feedback is available to both the progress and next-message callables this iteration.
+            stop, feedback = await self._evaluate_stop(loop_kwargs, work_iterations)
+            loop_kwargs = self._build_loop_kwargs(
+                context=context,
+                iteration=iteration,
+                last_result=result,
+                messages_used=messages_used,
+                original_messages=original_messages,
+                progress=progress,
+                feedback=feedback,
+            )
+            # Capture this iteration's progress entry, then refresh loop_kwargs so the next-message
+            # resolution sees the latest entry.
+            if await self._record_progress(result, loop_kwargs, progress):
+                loop_kwargs = self._build_loop_kwargs(
+                    context=context,
+                    iteration=iteration,
+                    last_result=result,
+                    messages_used=messages_used,
+                    original_messages=original_messages,
+                    progress=progress,
+                    feedback=feedback,
+                )
+            if stop:
+                break
+            if snapshot is not None and context.session is not None:
+                # Reset the session to the pre-loop baseline so the next run starts fresh; only the
+                # progress log (injected by _resolve_next_message) carries continuity forward.
+                self._restore_session(context.session, snapshot)
+            next_messages = await self._resolve_next_message(loop_kwargs, messages_used, original_messages)
+            context.messages = next_messages
+            aggregated.extend(next_messages)
+
+        if not self.return_final_only:
+            context.result = self._aggregate_response(final_result, aggregated, aggregated_usage)
+
+    def _process_streaming(
+        self,
+        context: AgentContext,
+        call_next: Callable[[], Awaitable[None]],
+        original_messages: list[Message],
+        snapshot: dict[str, Any] | None,
+    ) -> None:
+        # Holds the last iteration's final response so the outer stream's finalizer can return it
+        # rather than an aggregate of every iteration.
+        holder: dict[str, AgentResponse | None] = {"final": None}
+
+        async def _generator() -> Any:
+            iteration = 0
+            work_iterations = 0
+            progress: list[str] = []
+            while True:
+                try:
+                    await call_next()
+                    inner = context.result
+                    if not isinstance(inner, ResponseStream):
+                        raise TypeError(
+                            "AgentLoopMiddleware expected a ResponseStream from a streaming run, "
+                            f"got {type(inner).__name__}."
+                        )
+
+                    async for update in inner:
+                        yield update
+
+                    holder["final"] = await inner.get_final_response()
+                except MiddlewareTermination:
+                    # The pipeline's MiddlewareTermination suppression is no longer active once
+                    # process() has returned (the stream is consumed lazily), so a termination
+                    # raised by a downstream middleware or during stream consumption surfaces here.
+                    # Stop cleanly and keep whatever final response we have from a prior iteration.
+                    return
+
+                iteration += 1
+
+                messages_used = context.messages
+                final = holder["final"]
+                loop_kwargs = self._build_loop_kwargs(
+                    context=context,
+                    iteration=iteration,
+                    last_result=final,
+                    messages_used=messages_used,
+                    original_messages=original_messages,
+                    progress=progress,
+                )
+
+                work_iterations += 1
+                # Decide whether to stop and capture any feedback from should_continue first, so the
+                # feedback is available to both the progress and next-message callables this iteration.
+                stop, feedback = await self._evaluate_stop(loop_kwargs, work_iterations)
+                loop_kwargs = self._build_loop_kwargs(
+                    context=context,
+                    iteration=iteration,
+                    last_result=final,
+                    messages_used=messages_used,
+                    original_messages=original_messages,
+                    progress=progress,
+                    feedback=feedback,
+                )
+                if await self._record_progress(final, loop_kwargs, progress):
+                    loop_kwargs = self._build_loop_kwargs(
+                        context=context,
+                        iteration=iteration,
+                        last_result=final,
+                        messages_used=messages_used,
+                        original_messages=original_messages,
+                        progress=progress,
+                        feedback=feedback,
+                    )
+                if stop:
+                    return
+                if snapshot is not None and context.session is not None:
+                    # Reset the session to the pre-loop baseline before the next run. The final
+                    # response was already awaited above, so the service-side conversation id has
+                    # been propagated and is safe to discard here.
+                    self._restore_session(context.session, snapshot)
+                next_messages = await self._resolve_next_message(loop_kwargs, messages_used, original_messages)
+                context.messages = next_messages
+                # Surface the injected "nudge" messages in the stream so consumers see the user
+                # turns that drive each subsequent iteration (the equivalent of the aggregated
+                # transcript that non-streaming runs return).
+                for message in next_messages:
+                    yield self._message_to_update(message)
+
+        def _finalize(updates: Sequence[AgentResponseUpdate]) -> AgentResponse:
+            if holder["final"] is not None:
+                return holder["final"]
+            return AgentResponse.from_updates(updates)
+
+        context.result = ResponseStream(_generator(), finalizer=_finalize)
+
+    def _build_loop_kwargs(
+        self,
+        *,
+        context: AgentContext,
+        iteration: int,
+        last_result: AgentResponse | None,
+        messages_used: list[Message],
+        original_messages: list[Message],
+        progress: list[str],
+        feedback: str | None = None,
+    ) -> dict[str, Any]:
+        return {
+            "iteration": iteration,
+            "last_result": last_result,
+            "messages": messages_used,
+            "original_messages": original_messages,
+            "session": context.session,
+            "agent": context.agent,
+            # A copy so user callbacks cannot mutate the loop's internal progress log.
+            "progress": list(progress),
+            # Feedback returned by ``should_continue`` for this iteration (``None`` if it returned a
+            # plain bool, or the stop was decided by ``max_iterations``).
+            "feedback": feedback,
+        }
+
+    async def _record_progress(
+        self,
+        last_result: AgentResponse | None,
+        loop_kwargs: dict[str, Any],
+        progress: list[str],
+    ) -> bool:
+        """Capture this iteration's feedback into ``progress``. Returns ``True`` if an entry was added."""
+        if self.record_feedback is not None:
+            entry = await _maybe_await(self.record_feedback(**loop_kwargs))
+        else:
+            entry = last_result.text.strip() if last_result is not None else None
+        if entry:
+            progress.append(entry)
+            return True
+        return False
+
+    async def _evaluate_stop(self, loop_kwargs: dict[str, Any], work_iterations: int) -> tuple[bool, str | None]:
+        """Decide whether the loop should stop, returning ``(stop, feedback)``.
+
+        ``max_iterations`` is a safety cap that short-circuits before ``should_continue`` is
+        evaluated (so an expensive predicate/judge is not called once the cap has fired). Any
+        feedback returned by ``should_continue`` is propagated so the progress and next-message
+        callables can reference it.
+        """
+        if self.max_iterations is not None and work_iterations >= self.max_iterations:
+            return True, None
+        keep_going, feedback = await self._should_continue(loop_kwargs)
+        return (not keep_going), feedback
+
+    async def _should_continue(self, loop_kwargs: dict[str, Any]) -> tuple[bool, str | None]:
+        """Evaluate the predicate, normalizing its result to ``(continue, feedback)``."""
+        result = await _maybe_await(self.should_continue(**loop_kwargs))
+        return (bool(result[0]), result[1]) if isinstance(result, tuple) else (bool(result), None)  # type: ignore
+
+    @staticmethod
+    def _message_to_update(message: Message) -> AgentResponseUpdate:
+        """Wrap an injected loop message as a streaming update so consumers see it inline."""
+        return AgentResponseUpdate(
+            contents=message.contents,
+            role=message.role,
+            author_name=message.author_name,
+            message_id=message.message_id,
+        )
+
+    @staticmethod
+    def _aggregate_response(
+        final: AgentResponse,
+        messages: list[Message],
+        usage: UsageDetails | None,
+    ) -> AgentResponse:
+        """Build a combined response carrying every iteration's messages and summed usage.
+
+        Metadata (``response_id``, structured ``value``, etc.) is taken from the final iteration; the
+        structured value is passed through pre-parsed so it is not re-derived from the aggregated text.
+        """
+        return AgentResponse(
+            messages=messages,
+            response_id=final.response_id,
+            agent_id=final.agent_id,
+            created_at=final.created_at,
+            finish_reason=final.finish_reason,  # pyright: ignore[reportArgumentType]
+            usage_details=usage,
+            value=final.value,
+            additional_properties=dict(final.additional_properties) if final.additional_properties else None,
+            raw_representation=final.raw_representation,
+        )
+
+    @staticmethod
+    def _render_progress(entries: list[str]) -> Message:
+        """Format progress-log entries into a single ``user`` message."""
+        body = "\n".join(f"- {entry}" for entry in entries)
+        return Message(role="user", contents=[f"Progress so far:\n{body}"])
+
+    async def _resolve_next_message(
+        self,
+        loop_kwargs: dict[str, Any],
+        messages_used: list[Message],
+        original_messages: list[Message],
+    ) -> list[Message]:
+        # Compute the base next input. A ``next_message`` callable returning None requests a verbatim
+        # reuse of the previous messages (no progress injection); in fresh-context mode that escape
+        # hatch does not apply, so fall back to the default nudge instead.
+        if self.next_message is None:
+            next_msgs = normalize_messages(DEFAULT_NEXT_MESSAGE)
+        else:
+            next_input = await _maybe_await(self.next_message(**loop_kwargs))
+            if next_input is None:
+                if not self.fresh_context:
+                    return list(messages_used)
+                next_msgs = normalize_messages(DEFAULT_NEXT_MESSAGE)
+            else:
+                next_msgs = normalize_messages(next_input)
+
+        progress: list[str] = loop_kwargs.get("progress") or []
+        session = loop_kwargs.get("session")
+        progress_msg: Message | None = None
+        if self.inject_progress and progress:
+            # With a session the earlier entries are already retained in the conversation, so only
+            # the latest entry is injected to avoid duplication. Otherwise inject the full log.
+            entries = progress if (session is None or self.fresh_context) else progress[-1:]
+            progress_msg = self._render_progress(entries)
+
+        if self.fresh_context:
+            result = list(original_messages)
+            if progress_msg is not None:
+                result.append(progress_msg)
+            result.extend(next_msgs)
+            return result
+
+        if progress_msg is not None:
+            return [progress_msg, *next_msgs]
+        return list(next_msgs)
+
+
+def todos_remaining(provider: Any) -> ShouldContinueCallable:
+    """Build a ``should_continue`` predicate that loops while a ``TodoProvider`` has open items.
+
+    Args:
+        provider: A :class:`~agent_framework.TodoProvider` attached to the same session as the loop.
+
+    Returns:
+        A predicate suitable for :class:`AgentLoopMiddleware`'s ``should_continue`` argument.
+    """
+
+    async def _should_continue(*, session: Any = None, **kwargs: Any) -> bool:
+        if session is None:
+            return False
+        items = await provider.store.load_items(session, source_id=provider.source_id)
+        return any(not item.is_complete for item in items)
+
+    return _should_continue
+
+
+def background_tasks_running(provider: Any) -> ShouldContinueCallable:
+    """Build a ``should_continue`` predicate that loops while a ``BackgroundAgentsProvider`` is busy.
+
+    The predicate inspects the provider's persisted task state and continues while any task is still
+    marked as running. Pair it with ``max_iterations`` so the loop is guaranteed to stop even if a
+    task's persisted status is never refreshed.
+
+    Args:
+        provider: A :class:`~agent_framework.BackgroundAgentsProvider` attached to the same session
+            as the loop.
+
+    Returns:
+        A predicate suitable for :class:`AgentLoopMiddleware`'s ``should_continue`` argument.
+    """
+    from ._background_agents import BackgroundTaskInfo, BackgroundTaskStatus
+
+    def _should_continue(*, session: Any = None, **kwargs: Any) -> bool:
+        if session is None:
+            return False
+        state = session.state.get(provider.source_id)
+        if not state:
+            return False
+        return any(
+            BackgroundTaskInfo.from_dict(task).status == BackgroundTaskStatus.RUNNING for task in state.get("tasks", [])
+        )
+
+    return _should_continue
@@ -7,6 +7,10 @@ This folder contains focused middleware samples for `Agent`, chat clients, tools
 | File | Description |
 |------|-------------|
 | [`agent_and_run_level_middleware.py`](./agent_and_run_level_middleware.py) | Demonstrates combining agent-level and run-level middleware. |
+| [`agent_loop_middleware_refinement.py`](./agent_loop_middleware_refinement.py) | Demonstrates `AgentLoopMiddleware` with a `should_continue` predicate: a completion-marker refinement loop with feedback tracking and `fresh_context`. |
+| [`agent_loop_middleware_todos.py`](./agent_loop_middleware_todos.py) | Demonstrates `AgentLoopMiddleware` with a `should_continue` predicate built from a `TodoProvider` via `todos_remaining`, so the agent keeps working while open todos remain. |
+| [`agent_loop_middleware_judge.py`](./agent_loop_middleware_judge.py) | Demonstrates `AgentLoopMiddleware.with_judge`: a ChatClient judge re-runs the agent until it decides the original request was answered, with `criteria` shared between the agent and the judge. |
+| [`agent_loop_middleware_report.py`](./agent_loop_middleware_report.py) | Demonstrates composing two `AgentLoopMiddleware` on one agent: an inner `todos_remaining` loop that drafts a report todo-by-todo, wrapped by an outer report-style `with_judge` loop that re-runs it until an editor chat client judges the report publication-ready. |
 | [`chat_middleware.py`](./chat_middleware.py) | Shows class-based and function-based chat middleware that can observe, modify, and override model calls. |
 | [`class_based_middleware.py`](./class_based_middleware.py) | Shows class-based agent and function middleware. |
 | [`decorator_middleware.py`](./decorator_middleware.py) | Demonstrates middleware registration with decorators. |
@@ -0,0 +1,118 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+import asyncio
+
+from agent_framework import Agent, AgentLoopMiddleware
+from agent_framework.foundry import FoundryChatClient
+from azure.identity.aio import AzureCliCredential
+from dotenv import load_dotenv
+
+# Load environment variables from .env file
+load_dotenv()
+
+"""
+Agent Loop Middleware: ChatClient judge
+
+This sample demonstrates ``AgentLoopMiddleware.with_judge(...)``: a second chat client decides (via a
+``JudgeVerdict`` structured output) whether the original request was answered, and the loop continues
+while the answer is "no". The judge's ``reasoning`` is fed back to the agent as the next iteration's
+input, so the agent knows what is missing. The loop also passes a list of ``criteria``, which are
+injected as an extra instruction for the agent and rendered into the judge's instructions.
+
+The loop is run with streaming, so the judge's feedback between iterations shows up as a ``user``
+update; the stream is printed as ``<role>: <content>`` lines.
+
+Environment variables:
+    FOUNDRY_PROJECT_ENDPOINT — Azure AI Foundry project endpoint URL
+    FOUNDRY_MODEL            — Model deployment name
+
+Authentication:
+    Run ``az login`` before running this sample.
+"""
+
+
+async def judge_loop(client: FoundryChatClient, judge_client: FoundryChatClient) -> None:
+    """A second chat client judges whether the request was answered."""
+    print("\n=== ChatClient judge (loop until the request is answered) ===")
+
+    # 1. Provide a ``judge_client``. The middleware asks it (via a ``JudgeVerdict`` structured
+    #    output) whether the original request has been fully addressed and continues while the
+    #    answer is "no". The judge's ``reasoning`` is fed back to the agent as the next iteration's
+    #    input, so the agent knows what is missing. Judge loops default to a small ``max_iterations``
+    #    cap because each pass costs an extra model call.
+    #
+    #    ``criteria`` is a list of requirements the response must satisfy. The loop (a) injects them
+    #    as an extra instruction for the agent before it runs and (b) renders them into the judge's
+    #    instructions (the default judge prompt includes a ``{{criteria}}`` placeholder). Supply your
+    #    own ``instructions`` string with ``{{criteria}}`` to control the wording, or omit ``criteria``
+    #    entirely and pass a plain ``instructions`` string.
+    loop = AgentLoopMiddleware.with_judge(
+        judge_client,
+        criteria=[
+            "Mentions the moon",
+            "Includes at least one good joke",
+            "Is written as a single piece of fluent prose",
+        ],
+        max_iterations=4,
+    )
+
+    agent = Agent(
+        client=client,
+        name="answerer",
+        instructions="You are a helpful assistant. Answer the user's question thoroughly.",
+        middleware=[loop],
+    )
+
+    # 2. Run with streaming; the judge's feedback appears as a ``user`` update between iterations
+    #    until the judge is satisfied (or the iteration cap is reached). Each contiguous ``user``
+    #    block marks the boundary into the next iteration, so we count loop iterations by those
+    #    boundaries (robust to function calling, where one iteration may issue several model calls).
+    iterations = 1
+    in_user_block = False
+    assistant_open = False
+    async for update in agent.run("Explain why the sky is blue and sunsets are red.", stream=True):
+        if update.role == "user":
+            if not in_user_block:
+                iterations += 1
+                in_user_block = True
+            assistant_open = False
+            print(f"\nuser: {update.text}", flush=True)
+            continue
+        in_user_block = False
+        if update.text:
+            if not assistant_open:
+                print("\nassistant: ", end="", flush=True)
+                assistant_open = True
+            print(update.text, end="", flush=True)
+    print(f"\n\nCompleted in {iterations} iteration(s).")
+
+
+async def main() -> None:
+    # A single credential is reused; the judge uses its own client instance.
+    async with AzureCliCredential() as credential:
+        client = FoundryChatClient(credential=credential)
+        judge_client = FoundryChatClient(credential=credential)
+        await judge_loop(client, judge_client)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
+
+
+"""
+Sample output (abridged; exact text varies by model):
+
+=== ChatClient judge (loop until the request is answered) ===
+assistant: The sky is blue because shorter (blue) wavelengths scatter more (Rayleigh scattering).
+user: An evaluator reviewed your previous response and judged that it does not yet fully
+address the original request.
+
+Evaluator feedback: The response does not mention the moon.
+
+Revise and continue so the original request is fully addressed.
+assistant: The sky is blue because shorter (blue) wavelengths scatter more. At sunset, light travels
+through more atmosphere, scattering away blue and leaving red/orange hues. The moon follows the
+sky's colors because the same scattering applies to the light reaching it.
+
+Completed in 2 iteration(s).
+"""
@@ -0,0 +1,121 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+import asyncio
+
+from agent_framework import Agent, AgentLoopMiddleware, AgentResponse
+from agent_framework.foundry import FoundryChatClient
+from azure.identity.aio import AzureCliCredential
+from dotenv import load_dotenv
+
+# Load environment variables from .env file
+load_dotenv()
+
+"""
+Agent Loop Middleware: refinement loop (should_continue + feedback tracking)
+
+This sample demonstrates ``AgentLoopMiddleware`` driven by a ``should_continue`` predicate. The loop
+keeps refining a candidate answer until the agent's latest response contains a completion marker. It
+also shows feedback tracking: ``record_feedback`` logs per-iteration progress that is fed into the
+next pass, ``fresh_context`` restarts each pass from the original task plus that log, and
+``max_iterations`` bounds the loop as a safety cap.
+
+``next_message`` controls the input for the next iteration (it defaults to a short "continue" nudge).
+The loop is run with streaming, so the injected messages between iterations show up as ``user``
+updates; the stream is printed as ``<role>: <content>`` lines.
+
+Environment variables:
+    FOUNDRY_PROJECT_ENDPOINT — Azure AI Foundry project endpoint URL
+    FOUNDRY_MODEL            — Model deployment name
+
+Authentication:
+    Run ``az login`` before running this sample.
+"""
+
+COMPLETE_MARKER = "<promise>COMPLETE</promise>"
+
+
+async def refinement_loop(client: FoundryChatClient) -> None:
+    """Loop while the response does not yet contain a completion marker."""
+    print("\n=== Refinement loop (should_continue marker + feedback tracking, capped at 5) ===")
+
+    # 1. ``should_continue`` keeps the loop running until the agent signals it is done by including
+    #    the completion marker in its latest response. It is called with the loop keyword args and
+    #    returns True to run the agent again.
+    def should_continue(*, last_result: AgentResponse, **kwargs: object) -> bool:
+        return COMPLETE_MARKER not in last_result.text
+
+    # 2. ``record_feedback`` captures a short progress entry each iteration. Returning a string
+    #    appends it to the log (returning None falls back to the response text). The accumulated log
+    #    is injected into the next iteration's input so the agent builds on prior work.
+    def record_feedback(*, iteration: int, last_result: AgentResponse, **kwargs: object) -> str:
+        return f"iteration {iteration}: {last_result.text.strip()[:80]}"
+
+    # 3. ``fresh_context=True`` restarts each pass from the original task plus the progress log, and
+    #    ``max_iterations`` bounds the loop as a safety cap.
+    loop = AgentLoopMiddleware(
+        should_continue,
+        max_iterations=5,
+        record_feedback=record_feedback,
+        fresh_context=True,
+    )
+
+    # 4. Attach the middleware to the agent.
+    agent = Agent(
+        client=client,
+        name="refiner",
+        instructions=(
+            "You are iteratively refining a product name for a note-taking app. Each turn, build on the "
+            "progress log: propose an improved candidate with a short reason. When you are confident the "
+            f"name is final, end your message with the exact marker {COMPLETE_MARKER}."
+        ),
+        middleware=[loop],
+    )
+
+    # 5. Run once with streaming. The middleware drives the iterations, feeding progress forward until
+    #    the agent emits the completion marker or the iteration cap is reached. Each contiguous
+    #    ``user`` block marks the boundary into the next iteration, so we count loop iterations by
+    #    those boundaries (robust to function calling, where one iteration may issue several model
+    #    calls; tool calls/results are never ``user`` updates).
+    iterations = 1
+    in_user_block = False
+    assistant_open = False
+    async for update in agent.run("Suggest a name for a note-taking app.", stream=True):
+        if update.role == "user":
+            if not in_user_block:
+                iterations += 1
+                in_user_block = True
+            assistant_open = False
+            print(f"\nuser: {update.text}", flush=True)
+            continue
+        in_user_block = False
+        if update.text:
+            if not assistant_open:
+                print("\nassistant: ", end="", flush=True)
+                assistant_open = True
+            print(update.text, end="", flush=True)
+    print(f"\n\nCompleted in {iterations} iteration(s).")
+
+
+async def main() -> None:
+    async with AzureCliCredential() as credential:
+        client = FoundryChatClient(credential=credential)
+        await refinement_loop(client)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
+
+
+"""
+Sample output (abridged; exact text varies by model):
+
+=== Refinement loop (should_continue marker + feedback tracking, capped at 5) ===
+assistant: "QuickJot" — short and evokes fast capture.
+user: Suggest a name for a note-taking app.
+user: Progress so far:
+- iteration 1: "QuickJot" — short and evokes fast capture.
+user: Continue working on the task. If it is complete, say so.
+assistant: How about "MarginNote" — it evokes jotting ideas in the margins. <promise>COMPLETE</promise>
+
+Completed in 2 iteration(s).
+"""
@@ -0,0 +1,208 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+import asyncio
+
+from agent_framework import (
+    Agent,
+    AgentLoopMiddleware,
+    AgentSession,
+    TodoProvider,
+    todos_remaining,
+)
+from agent_framework.foundry import FoundryChatClient
+from azure.identity.aio import AzureCliCredential
+from dotenv import load_dotenv
+
+# Load environment variables from .env file
+load_dotenv()
+
+"""
+Agent Loop Middleware: todo list + report-style judge, composed as two middleware
+
+This sample demonstrates a more complex ``AgentLoopMiddleware`` setup that composes TWO separate loop
+middleware on a single agent — rather than hand-writing one predicate that does both checks. The
+agent's ``middleware`` list is the composition point:
+
+    middleware=[judge_loop, todo_loop]
+
+Agent middleware run outermost-first, so ``judge_loop`` wraps ``todo_loop``:
+
+1. ``todo_loop`` (inner) — built from the ``todos_remaining`` helper over a ``TodoProvider``. It
+   re-runs the agent while any todo item is still open, so the agent plans the report and then drafts
+   it one todo at a time. Its final todo assembles and emits the complete report, so when the inner
+   loop stops its final response is the full report.
+2. ``judge_loop`` (outer) — built from ``AgentLoopMiddleware.with_judge``. Each time the inner todo
+   loop finishes, a separate "editor" chat client reviews the assembled report (via a ``JudgeVerdict``
+   structured output) against a list of report ``criteria``. While the editor is not satisfied, the
+   outer loop re-runs the inner todo loop (the todos are already complete, so it runs the agent once)
+   with the editor's reasoning fed back, and the agent revises the full report.
+
+``with_judge(criteria=...)`` renders the criteria into both the editor's judge instructions and an
+extra instruction injected for the agent, so the agent writes toward the same bar the editor grades
+against. A custom report-style ``instructions`` string frames the judge as an editor reviewing a
+report.
+
+The loop is run with streaming, so the injected messages between iterations show up as ``user``
+updates; the stream is printed as ``<role>: <content>`` lines. Each contiguous ``user`` block (from
+either loop) marks a boundary into another agent run, so the printed count is the total number of
+agent runs across both loops.
+
+Environment variables:
+    FOUNDRY_PROJECT_ENDPOINT — Azure AI Foundry project endpoint URL
+    FOUNDRY_MODEL            — Model deployment name
+
+Authentication:
+    Run ``az login`` before running this sample.
+"""
+
+# Requirements the finished report must satisfy. Passed as ``criteria`` to ``with_judge``, which
+# renders them into both the editor's judge instructions and an extra instruction for the agent.
+REPORT_REQUIREMENTS = [
+    "Opens with a one-paragraph executive summary.",
+    "Has a clearly titled section for each part of the brief.",
+    "Ends with a short 'Key takeaways' bulleted list.",
+    "Is written in clear, professional prose.",
+]
+
+# Report-style judge instructions. The ``{{criteria}}`` placeholder is replaced by ``with_judge``
+# with the rendered REPORT_REQUIREMENTS block.
+EDITOR_INSTRUCTIONS = (
+    "You are a senior editor reviewing a research report. You are given the user's original brief and "
+    "the report the agent produced. Decide whether the report is publication-ready. Set 'answered' to "
+    "true only if the report is ready, otherwise set it to false and use 'reasoning' to state "
+    "concisely what is missing.{{criteria}}"
+)
+
+
+async def report_loop(client: FoundryChatClient, editor_client: FoundryChatClient) -> None:
+    """Compose a todo loop (inner) and a report-style judge loop (outer) on one agent."""
+    print("\n=== Todo list + report-style judge (two composed middleware) ===")
+
+    # 1. A TodoProvider gives the agent tools to plan and track the report as todo items. A single
+    #    session (created below) keeps this todo state alive across loop iterations.
+    todo_provider = TodoProvider()
+
+    # 2. Inner loop: re-run the agent while the TodoProvider still has open items. ``todos_remaining``
+    #    builds the ``should_continue`` predicate; ``max_iterations`` caps planning + one-todo-per-turn
+    #    drafting + the final assembly turn.
+    todo_loop = AgentLoopMiddleware(
+        todos_remaining(todo_provider),
+        max_iterations=8,
+    )
+
+    # 3. Outer loop: each time the inner todo loop finishes, ``editor_client`` judges the assembled
+    #    report against REPORT_REQUIREMENTS and the loop re-runs the inner loop while it is not yet
+    #    publication-ready. ``with_judge`` injects the criteria for the agent too, and feeds the
+    #    editor's reasoning back as the next iteration's input. The judge cap bounds the revision rounds.
+    judge_loop = AgentLoopMiddleware.with_judge(
+        editor_client,
+        instructions=EDITOR_INSTRUCTIONS,
+        criteria=REPORT_REQUIREMENTS,
+        max_iterations=4,
+    )
+
+    # 4. Compose the two middleware on the agent. Order matters: ``judge_loop`` is outermost (it wraps
+    #    and re-runs the whole ``todo_loop``), ``todo_loop`` is innermost (it drives the per-todo
+    #    drafting). The agent is told to finish with a dedicated assembly todo so that, when the inner
+    #    loop stops, its final response is the complete report the editor then grades.
+    agent = Agent(
+        client=client,
+        name="report-writer",
+        instructions=(
+            "You are a research writer producing a short report. "
+            "On your FIRST turn, break the report into todo items using your todo tools: one item per "
+            "report section, plus a final 'Assemble and output the complete report' item — then stop, "
+            "do not start writing yet. On EACH SUBSEQUENT turn while todos remain, complete exactly "
+            "ONE remaining todo item, draft its content, and mark it done using your tools — never "
+            "more than one item per turn. When you reach the final assembly item, output the FULL "
+            "report in a single message and mark it done. If an editor later returns feedback, revise "
+            "and output the full report again."
+        ),
+        context_providers=[todo_provider],
+        middleware=[judge_loop, todo_loop],
+    )
+
+    # 5. Run once with streaming. Reuse a single session so todo state persists across iterations.
+    #    Each contiguous ``user`` block marks a boundary into another agent run; both loops inject
+    #    such blocks (todo nudges and editor feedback), so the count is the total number of agent runs.
+    session = AgentSession()
+    prompt = "Write a brief report on the benefits and risks of remote work for software teams."
+    runs = 1
+    in_user_block = False
+    assistant_open = False
+    async for update in agent.run(prompt, session=session, stream=True):
+        if update.role == "user":
+            if not in_user_block:
+                runs += 1
+                in_user_block = True
+            assistant_open = False
+            print(f"\nuser: {update.text}", flush=True)
+            continue
+        in_user_block = False
+        if update.text:
+            if not assistant_open:
+                print("\nassistant: ", end="", flush=True)
+                assistant_open = True
+            print(update.text, end="", flush=True)
+    print(f"\n\nCompleted in {runs} agent run(s).")
+
+    # 6. Inspect the todos the agent created, loaded from the same store the inner loop uses.
+    items = await todo_provider.store.load_items(session, source_id=todo_provider.source_id)
+    print("\nTodos after the run:")
+    for item in items:
+        mark = "x" if item.is_complete else " "
+        print(f"  [{mark}] {item.id}. {item.title}")
+
+
+"""
+Sample output for ``report_loop`` (abridged; exact text varies by model):
+
+=== Todo list + report-style judge (two composed middleware) ===
+assistant: Here is my plan. I'll create todos for each section and a final assembly item.
+user: Continue working on the task. If it is complete, say so.
+...
+assistant: # Remote Work for Software Teams
+
+**Executive summary:** Remote work offers flexibility and access to wider talent...
+
+## Benefits
+...
+
+## Risks
+...
+
+## Key takeaways
+- Flexibility improves retention.
+- Async communication needs discipline.
+user: An evaluator reviewed your previous response and judged that it does not yet fully
+address the original request.
+
+Evaluator feedback: Add a one-paragraph executive summary before the first section.
+
+Revise and continue so the original request is fully addressed.
+assistant: # Remote Work for Software Teams
+
+**Executive summary:** ... (revised, now opens with a summary)
+...
+
+Completed in 7 agent run(s).
+
+Todos after the run:
+  [x] 1. Benefits section
+  [x] 2. Risks section
+  [x] 3. Key takeaways
+  [x] 4. Assemble and output the complete report
+"""
+
+
+async def main() -> None:
+    # A single credential is reused; the editor judge uses its own client instance.
+    async with AzureCliCredential() as credential:
+        client = FoundryChatClient(credential=credential)
+        editor_client = FoundryChatClient(credential=credential)
+
+        await report_loop(client, editor_client)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
@@ -0,0 +1,129 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+import asyncio
+
+from agent_framework import Agent, AgentLoopMiddleware, AgentSession, TodoProvider, todos_remaining
+from agent_framework.foundry import FoundryChatClient
+from azure.identity.aio import AzureCliCredential
+from dotenv import load_dotenv
+
+# Load environment variables from .env file
+load_dotenv()
+
+"""
+Agent Loop Middleware: todo loop (should_continue via a provider helper)
+
+This sample demonstrates ``AgentLoopMiddleware`` driven by a ``should_continue`` predicate built from
+a ``TodoProvider``. The ``todos_remaining`` helper keeps the agent running while it still has open
+todo items, so the agent plans work on its first turn and completes one item per turn afterwards.
+``max_iterations`` bounds the loop as a safety cap, and a single session keeps the todo state across
+iterations. After the run the sample prints the todos the agent created.
+
+The loop is run with streaming, so the injected messages between iterations show up as ``user``
+updates; the stream is printed as ``<role>: <content>`` lines.
+
+Environment variables:
+    FOUNDRY_PROJECT_ENDPOINT — Azure AI Foundry project endpoint URL
+    FOUNDRY_MODEL            — Model deployment name
+
+Authentication:
+    Run ``az login`` before running this sample.
+"""
+
+
+async def todo_loop(client: FoundryChatClient) -> None:
+    """Loop while a TodoProvider still has open items."""
+    print("\n=== Callable criterion (loop while todos remain) ===")
+
+    # 1. A TodoProvider gives the agent tools to plan and track work as todo items.
+    todo_provider = TodoProvider()
+
+    # 2. ``todos_remaining`` builds a ``should_continue`` predicate that returns True while any todo
+    #    item is still open. ``max_iterations`` guarantees the loop stops even if the agent stalls.
+    loop = AgentLoopMiddleware(
+        should_continue=todos_remaining(todo_provider),
+        max_iterations=6,
+    )
+
+    agent = Agent(
+        client=client,
+        name="planner",
+        instructions=(
+            "You are a writing assistant working through a todo list. "
+            "On your FIRST turn, break the task into todo items using your todo tools and stop "
+            "(do not start writing yet). On EACH SUBSEQUENT turn, complete exactly ONE remaining "
+            "todo item, write its content, and mark it done using your tools — never complete more "
+            "than one item per turn. When every item is done, give a brief final summary."
+        ),
+        context_providers=[todo_provider],
+        middleware=[loop],
+    )
+
+    # 3. Reuse a single session so todo state persists across loop iterations. Each contiguous
+    #    ``user`` block marks the boundary into the next iteration, so we count loop iterations by
+    #    those boundaries — robust to the function calling this loop relies on (the todo tools issue
+    #    several model calls per iteration, but tool calls/results are never ``user`` updates).
+    session = AgentSession()
+    prompt = "Plan and write a short 3-section blog post about Rayleigh scattering."
+    iterations = 1
+    in_user_block = False
+    assistant_open = False
+    async for update in agent.run(prompt, session=session, stream=True):
+        if update.role == "user":
+            if not in_user_block:
+                iterations += 1
+                in_user_block = True
+            assistant_open = False
+            print(f"\nuser: {update.text}", flush=True)
+            continue
+        in_user_block = False
+        if update.text:
+            if not assistant_open:
+                print("\nassistant: ", end="", flush=True)
+                assistant_open = True
+            print(update.text, end="", flush=True)
+    print(f"\n\nCompleted in {iterations} iteration(s).")
+
+    # 4. Inspect the todos the agent created, loaded from the same store the loop predicate uses.
+    items = await todo_provider.store.load_items(session, source_id=todo_provider.source_id)
+    print("\nTodos after the run:")
+    for item in items:
+        mark = "x" if item.is_complete else " "
+        print(f"  [{mark}] {item.id}. {item.title}")
+
+
+async def main() -> None:
+    async with AzureCliCredential() as credential:
+        client = FoundryChatClient(credential=credential)
+        await todo_loop(client)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
+
+
+"""
+Sample output (abridged; exact text varies by model):
+
+=== Callable criterion (loop while todos remain) ===
+assistant: Here is my plan. I'll create todos for each section.
+user: Progress so far:
+- Here is my plan. I'll create todos for each section.
+user: Continue working on the task. If it is complete, say so.
+assistant: Section 1 drafted. Marking it done.
+user: Progress so far:
+- Section 1 drafted. Marking it done.
+user: Continue working on the task. If it is complete, say so.
+assistant: Section 2 drafted. Marking it done.
+user: Progress so far:
+- Section 2 drafted. Marking it done.
+user: Continue working on the task. If it is complete, say so.
+assistant: Section 3 drafted. Marking it done.
+
+Completed in 4 iteration(s).
+
+Todos after the run:
+  [x] 1. Draft "What light is" section
+  [x] 2. Draft "How Rayleigh scattering works" section
+  [x] 3. Draft "Why the sky is blue" section
+"""