Files
Eduard van Valkenburg 1acd242550 Python: Add AgentLoopMiddleware for re-running agents in a loop (#6174)
* Python: Add AgentLoopMiddleware for re-running agents in a loop

Add `AgentLoopMiddleware`, an `AgentMiddleware` that re-runs the wrapped
agent in a loop. A single configurable class covers three common patterns,
each with a convenience classmethod factory:

- Ralph loop (`.ralph(...)`): no exit criteria, with feedback tracking
  (`record_feedback`/`progress`), progress injection (`inject_progress`),
  optional fresh context per iteration (`fresh_context`), and an early-stop
  completion signal (`is_complete`).
- Predicate (`.with_predicate(...)`): loop while a `should_continue` callable
  returns True (e.g. paired with `todos_remaining`/`background_tasks_running`).
- Judge (`.with_judge(...)`): a second chat client decides whether the original
  request was answered, using a `JudgeVerdict` structured-output response.

The loop also auto-resolves pending function-approval / user-input requests via
an `on_approval_request` callable (bounded by `max_approval_rounds`), and the
next iteration's input is controlled by `next_message`. Supports both streaming
and non-streaming runs.

Exports `AgentLoopMiddleware`, `JudgeVerdict`, `todos_remaining`, and
`background_tasks_running`. Adds tests, a sample, and docs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Refine AgentLoopMiddleware API and sample

- with_judge: add criteria list with {{criteria}} templating into judge
  instructions plus an agent-side instruction; add fresh_context, additional
  judge feedback relay; default judge max_iterations.
- should_continue is now required and positional; supports (bool, str|None)
  feedback tuples surfaced to next_message/record_feedback via feedback kwarg.
- Judge forwards full multi-modal request and response messages.
- Default max_iterations=10 (explicit None = unbounded); removed is_complete and
  Ralph terminology; ShouldContinueResult is a real TypeAlias.
- Sample: stream all loops, print iteration counts via injected user-block
  boundaries (robust to function calling), <role>: content formatting, per-method
  expected output, and a looping todo sample.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Fix CI checks for AgentLoopMiddleware

- Resolve pyright errors in _loop.py: drop the always-true final_result None
  check (the while loop always assigns it) and cast finish_reason to the
  AgentResponse constructor's expected type.
- Apply pyupgrade --py310-plus: import TypeAlias from typing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Resolve mypy/pyright disagreement on finish_reason

pyright infers AgentResponse.finish_reason as including str and rejects the
direct assignment, while mypy considers a cast redundant. Drop the cast and
suppress only pyright with a targeted reportArgumentType ignore, satisfying
both type checkers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Add todo+judge AgentLoopMiddleware sample

Add a second AgentLoopMiddleware sample that composes two criteria in one
should_continue predicate: a TodoProvider check (evaluated first) and a
report-style judge chat client (evaluated once todos are complete) that grades
the assembled report against shared requirements. Register it in the middleware
samples README.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Compose todo+judge loops as two middleware

Rework the todo+judge sample to compose two AgentLoopMiddleware on the agent
itself (middleware=[judge_loop, todo_loop]) instead of a single hand-written
predicate. The inner todos_remaining loop drafts the report todo-by-todo and the
outer with_judge loop re-runs it until an editor chat client judges the report
publication-ready, reusing the built-in helpers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Reset session for fresh_context loops via snapshot/restore

AgentLoopMiddleware.fresh_context previously only reset context.messages,
so with an attached session each iteration still reloaded the local
transcript or re-threaded the service-side conversation id and the model
saw the accumulated history. Snapshot the session once before the loop
(via to_dict) and restore it (from_dict + field copy) between iterations,
so every pass starts from the pre-loop baseline. The final iteration's
pass is persisted (no restore after the terminating iteration), so a
subsequent agent.run continues from there.

Removed the obsolete warning, updated docstrings and core AGENTS.md, and
added tests: a snapshot/restore round-trip, a session-reset
streaming x fresh_context x inject_progress x store matrix across multiple
runs and loop iterations, and response_format parsing across the loop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Updated samples and docstrings

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-12 14:35:54 +00:00

119 lines
5.0 KiB
Python

# Copyright (c) Microsoft. All rights reserved.
import asyncio
from agent_framework import Agent, AgentLoopMiddleware
from agent_framework.foundry import FoundryChatClient
from azure.identity.aio import AzureCliCredential
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
"""
Agent Loop Middleware: ChatClient judge
This sample demonstrates ``AgentLoopMiddleware.with_judge(...)``: a second chat client decides (via a
``JudgeVerdict`` structured output) whether the original request was answered, and the loop continues
while the answer is "no". The judge's ``reasoning`` is fed back to the agent as the next iteration's
input, so the agent knows what is missing. The loop also passes a list of ``criteria``, which are
injected as an extra instruction for the agent and rendered into the judge's instructions.
The loop is run with streaming, so the judge's feedback between iterations shows up as a ``user``
update; the stream is printed as ``<role>: <content>`` lines.
Environment variables:
FOUNDRY_PROJECT_ENDPOINT — Azure AI Foundry project endpoint URL
FOUNDRY_MODEL — Model deployment name
Authentication:
Run ``az login`` before running this sample.
"""
async def judge_loop(client: FoundryChatClient, judge_client: FoundryChatClient) -> None:
"""A second chat client judges whether the request was answered."""
print("\n=== ChatClient judge (loop until the request is answered) ===")
# 1. Provide a ``judge_client``. The middleware asks it (via a ``JudgeVerdict`` structured
# output) whether the original request has been fully addressed and continues while the
# answer is "no". The judge's ``reasoning`` is fed back to the agent as the next iteration's
# input, so the agent knows what is missing. Judge loops default to a small ``max_iterations``
# cap because each pass costs an extra model call.
#
# ``criteria`` is a list of requirements the response must satisfy. The loop (a) injects them
# as an extra instruction for the agent before it runs and (b) renders them into the judge's
# instructions (the default judge prompt includes a ``{{criteria}}`` placeholder). Supply your
# own ``instructions`` string with ``{{criteria}}`` to control the wording, or omit ``criteria``
# entirely and pass a plain ``instructions`` string.
loop = AgentLoopMiddleware.with_judge(
judge_client,
criteria=[
"Mentions the moon",
"Includes at least one good joke",
"Is written as a single piece of fluent prose",
],
max_iterations=4,
)
agent = Agent(
client=client,
name="answerer",
instructions="You are a helpful assistant. Answer the user's question thoroughly.",
middleware=[loop],
)
# 2. Run with streaming; the judge's feedback appears as a ``user`` update between iterations
# until the judge is satisfied (or the iteration cap is reached). Each contiguous ``user``
# block marks the boundary into the next iteration, so we count loop iterations by those
# boundaries (robust to function calling, where one iteration may issue several model calls).
iterations = 1
in_user_block = False
assistant_open = False
async for update in agent.run("Explain why the sky is blue and sunsets are red.", stream=True):
if update.role == "user":
if not in_user_block:
iterations += 1
in_user_block = True
assistant_open = False
print(f"\nuser: {update.text}", flush=True)
continue
in_user_block = False
if update.text:
if not assistant_open:
print("\nassistant: ", end="", flush=True)
assistant_open = True
print(update.text, end="", flush=True)
print(f"\n\nCompleted in {iterations} iteration(s).")
async def main() -> None:
# A single credential is reused; the judge uses its own client instance.
async with AzureCliCredential() as credential:
client = FoundryChatClient(credential=credential)
judge_client = FoundryChatClient(credential=credential)
await judge_loop(client, judge_client)
if __name__ == "__main__":
asyncio.run(main())
"""
Sample output (abridged; exact text varies by model):
=== ChatClient judge (loop until the request is answered) ===
assistant: The sky is blue because shorter (blue) wavelengths scatter more (Rayleigh scattering).
user: An evaluator reviewed your previous response and judged that it does not yet fully
address the original request.
Evaluator feedback: The response does not mention the moon.
Revise and continue so the original request is fully addressed.
assistant: The sky is blue because shorter (blue) wavelengths scatter more. At sunset, light travels
through more atmosphere, scattering away blue and leaving red/orange hues. The moon follows the
sky's colors because the same scattering applies to the light reaching it.
Completed in 2 iteration(s).
"""