Commit Graph

1 Commits

  • Python: Add AgentLoopMiddleware for re-running agents in a loop (#6174)
    * Python: Add AgentLoopMiddleware for re-running agents in a loop
    
    Add `AgentLoopMiddleware`, an `AgentMiddleware` that re-runs the wrapped
    agent in a loop. A single configurable class covers three common patterns,
    each with a convenience classmethod factory:
    
    - Ralph loop (`.ralph(...)`): no exit criteria, with feedback tracking
      (`record_feedback`/`progress`), progress injection (`inject_progress`),
      optional fresh context per iteration (`fresh_context`), and an early-stop
      completion signal (`is_complete`).
    - Predicate (`.with_predicate(...)`): loop while a `should_continue` callable
      returns True (e.g. paired with `todos_remaining`/`background_tasks_running`).
    - Judge (`.with_judge(...)`): a second chat client decides whether the original
      request was answered, using a `JudgeVerdict` structured-output response.
    
    The loop also auto-resolves pending function-approval / user-input requests via
    an `on_approval_request` callable (bounded by `max_approval_rounds`), and the
    next iteration's input is controlled by `next_message`. Supports both streaming
    and non-streaming runs.
    
    Exports `AgentLoopMiddleware`, `JudgeVerdict`, `todos_remaining`, and
    `background_tasks_running`. Adds tests, a sample, and docs.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Refine AgentLoopMiddleware API and sample
    
    - with_judge: add criteria list with {{criteria}} templating into judge
      instructions plus an agent-side instruction; add fresh_context, additional
      judge feedback relay; default judge max_iterations.
    - should_continue is now required and positional; supports (bool, str|None)
      feedback tuples surfaced to next_message/record_feedback via feedback kwarg.
    - Judge forwards full multi-modal request and response messages.
    - Default max_iterations=10 (explicit None = unbounded); removed is_complete and
      Ralph terminology; ShouldContinueResult is a real TypeAlias.
    - Sample: stream all loops, print iteration counts via injected user-block
      boundaries (robust to function calling), <role>: content formatting, per-method
      expected output, and a looping todo sample.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Fix CI checks for AgentLoopMiddleware
    
    - Resolve pyright errors in _loop.py: drop the always-true final_result None
      check (the while loop always assigns it) and cast finish_reason to the
      AgentResponse constructor's expected type.
    - Apply pyupgrade --py310-plus: import TypeAlias from typing.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Resolve mypy/pyright disagreement on finish_reason
    
    pyright infers AgentResponse.finish_reason as including str and rejects the
    direct assignment, while mypy considers a cast redundant. Drop the cast and
    suppress only pyright with a targeted reportArgumentType ignore, satisfying
    both type checkers.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Add todo+judge AgentLoopMiddleware sample
    
    Add a second AgentLoopMiddleware sample that composes two criteria in one
    should_continue predicate: a TodoProvider check (evaluated first) and a
    report-style judge chat client (evaluated once todos are complete) that grades
    the assembled report against shared requirements. Register it in the middleware
    samples README.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Compose todo+judge loops as two middleware
    
    Rework the todo+judge sample to compose two AgentLoopMiddleware on the agent
    itself (middleware=[judge_loop, todo_loop]) instead of a single hand-written
    predicate. The inner todos_remaining loop drafts the report todo-by-todo and the
    outer with_judge loop re-runs it until an editor chat client judges the report
    publication-ready, reusing the built-in helpers.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Reset session for fresh_context loops via snapshot/restore
    
    AgentLoopMiddleware.fresh_context previously only reset context.messages,
    so with an attached session each iteration still reloaded the local
    transcript or re-threaded the service-side conversation id and the model
    saw the accumulated history. Snapshot the session once before the loop
    (via to_dict) and restore it (from_dict + field copy) between iterations,
    so every pass starts from the pre-loop baseline. The final iteration's
    pass is persisted (no restore after the terminating iteration), so a
    subsequent agent.run continues from there.
    
    Removed the obsolete warning, updated docstrings and core AGENTS.md, and
    added tests: a snapshot/restore round-trip, a session-reset
    streaming x fresh_context x inject_progress x store matrix across multiple
    runs and loop iterations, and response_format parsing across the loop.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Updated samples and docstrings
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>