Files
agent-framework/python/packages/lab/lightning
T
Copilot 3f7ea350dc Python: Add tool call/result content types and update connectors and samples (#2971)
* Add new AI content types and image tool support

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Add Python content types for tool calls/results and image generation tool support

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Address review feedback for tool content and samples

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Tighten image generation typing and sample tools list

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Align image generation output typing

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Handle MCP naming, image options mapping, and connector tool content

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Allow MCP call in function approval request

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Remove raw image_generation tool remapping

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Restore Anthropic tool_use to function calls unless code execution

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Fix lint issues for hosted file docstring and MCP parsing

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Import ChatResponse types in Anthropic client

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Fix Anthropics citation type imports and MCP typing for handoff/tools

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Skip lightning tests without agentlightning and fix function call import

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* fix lint on lab package

* rebuilt anthropic parsing

* redid anthropic parsing

* typo

* updated parsing and added missing docstrings

* fix tests

* mypy fixes

* second mypy fix

* add new class to other samples

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
Co-authored-by: eavanvalkenburg <github@vanvalkenburg.eu>
3f7ea350dc ยท 2026-01-08 19:46:32 +00:00
History
..

Agent Framework Lab - Lightning

Agent Framework Lab Lightning is a specialized package that integrates Microsoft Agent Framework with Agent-lightning to provide reinforcement learning (RL) training capabilities for AI agents.

This package enables you to train and fine-tune agents using advanced RL algorithms from VERL (e.g., GRPO, PPO, Reinforce++) with support for distributed training, multi-GPU setups, and comprehensive monitoring. It also supports complex multi-turn agent interactions during training and optimization techniques like prompt optimization. See the Agent-lightning documentation for details.

Note

: This module is part of the consolidated agent-framework-lab package. Install the package with the lightning extra to use this module.

Installation

Install the agent-framework-lab package with Lightning dependencies:

pip install "agent-framework-lab[lightning]"

Optional Dependencies

# For math-related training
pip install -e ".[lightning,math]"

# For tau2 benchmarking
pip install -e ".[lightning,tau2]"

To prepare for RL training, you'll also need to install dependencies like PyTorch, Ray, and vLLM. See the Agent-lightning setup instructions for more details.

Usage Patterns

The basic usage pattern follows these steps:

  1. Prepare your dataset as a list of samples (typically dictionaries)
  2. Create an agent function that processes samples and returns evaluation scores
  3. Decorate with @agentlightning.rollout to enable training
  4. Configure and run training with the agentlightning.Trainer class

Example Implementation

from agent_framework.lab.lightning import AgentFrameworkTracer
from agentlightning import rollout, Trainer, LLM, Dataset
from agentlightning.algorithm.verl import VERL

TaskType = Any

@rollout
async def math_agent(task: TaskType, llm: LLM) -> float:
    """A function that solves a math problem and returns the evaluation score."""
    async with (
        MCPStdioTool(name="calculator", command="uvx", args=["mcp-server-calculator"]) as mcp_server,
        ChatAgent(
            chat_client=OpenAIChatClient(
                model_id=llm.model,
                api_key="your-api-key",
                base_url=llm.endpoint,
            ),
            name="MathAgent",
            instructions="Solve the math problem and output answer after ###",
            temperature=llm.sampling_parameters.get("temperature", 0.0),
        ) as agent,
    ):
        result = await agent.run(task["question"], tools=mcp_server)
        # Your evaluation logic here...
        return evaluation_score

# Training configuration
config = {
    "data": {"train_batch_size": 8},
    "trainer": {"total_epochs": 2, "n_gpus_per_node": 1},
    # ... additional config
}

# Initialize agent-framework tracer to send telemetry data to agent-lightning's observability backend
tracer = AgentFrameworkTracer()

trainer = Trainer(algorithm=VERL(config), tracer=tracer, n_workers=2)
# Both train_dataset and val_dataset are lists of TaskType
trainer.fit(math_agent, train_dataset, val_data=val_dataset)

Example 1: Training a Math Agent

This example trains an agent that uses an MCP calculator tool to solve math problems. The dataset is a small subset from the Calc-X dataset. The Agent-lightning team has also experimented with a similar agent using a larger dataset. See this example for more details.

Running this example requires a minimum of 40GB GPU memory. If you don't have enough GPU memory, you can use a smaller model like Qwen2.5-0.5B-Instruct, though the results won't be as good. To run the example:

cd samples
# Run the ray cluster (see the troubleshooting section for more details)
ray start --head --dashboard-host=0.0.0.0
# Run the training script
python train_math_agent.py

To debug the agent used in the example, you can run the script with the --debug flag:

python train_math_agent.py --debug

The training curve below shows results with Qwen2.5-1.5B-Instruct and GRPO. Validation accuracy increases from 10% to 35% in the first 8 steps, then begins to overfit.

Training Curve

Example 2: Training a Tau2 Agent

This advanced example demonstrates training on complex multi-agent scenarios using the Tau2 benchmark. It features a multi-agent setup with an assistant agent and a user simulator agent, training the assistant while keeping the user simulator fixed. The example incorporates a multi-step workflow with tool usage and complex evaluation metrics. Currently, training uses the airline domain with a 50/50 split between training and validation data.

Before running this example, please read the agent-lightning-lab-tau2 documentation and follow the setup instructions.

To run the example:

# Set required environment variables
export TAU2_DATA_DIR="/path/to/tau2/data"

# Used for user simulator and LLM judge
export OPENAI_BASE_URL="your-endpoint"
export OPENAI_API_KEY="your-key"

# Used for tracking on Weights & Biases
export WANDB_API_KEY="your-key"

# Run the ray cluster
ray start --head --dashboard-host=0.0.0.0

# Train the tau2 agent
cd samples
python samples/train_tau2_agent.py

# Debug mode
python samples/train_tau2_agent.py --debug

This example uses more advanced Agent-lightning features compared to the math example. It's based on the LitAgent class rather than the @rollout decorator and involves concepts like resources and agent filtering. We recommend reading the Agent-lightning documentation to learn more.

Results with Qwen2.5-1.5B-Instruct and GRPO are shown below. Validation accuracy improves from 28% to 40% over 8 epochs.

Training Curve

Troubleshooting

Ray Connection Issues

Agent-lightning uses VERL for RL training, which depends on Ray. To avoid issues, it's recommended to start Ray manually beforehand. If you encounter Ray startup problems:

# Stop existing Ray processes
ray stop

# Start Ray with debugging enabled
env RAY_DEBUG=legacy HYDRA_FULL_ERROR=1 VLLM_USE_V1=1 ray start --head --dashboard-host=0.0.0.0

Important: Run Ray commands in the same directory as your training script. Set any required environment variables (WANDB_API_KEY, HF_TOKEN) before starting Ray.

GPU Memory Issues

  1. Reduce gpu_memory_utilization to <0.8
  2. Enable FSDP offloading:
    "fsdp_config": {
        "param_offload": True,
        "optimizer_offload": True,
    }
    
  3. Decrease batch sizes:
    • train_batch_size
    • ppo_mini_batch_size
    • log_prob_micro_batch_size_per_gpu

Agent Debugging

Always test your agent before training:

# Use debug mode to validate agent behavior
python your_training_script.py --debug

# Check agent responses and evaluation logic
# Ensure proper tool integration and result extraction

Contributing

This package is part of the Microsoft Agent Framework Lab. Please see the main repository for contribution guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.