Name	Name	Last commit message	Last commit date
parent directory ..
agent_framework_lab_lightning	agent_framework_lab_lightning
assets	assets
samples	samples
tests	tests
.gitattributes	.gitattributes
README.md	README.md

Agent Framework Lab - Lightning

Agent Framework Lab Lightning is a specialized package that integrates Microsoft Agent Framework with Agent-lightning to provide reinforcement learning (RL) training capabilities for AI agents.

This package enables you to train and fine-tune agents using advanced RL algorithms from VERL (e.g., GRPO, PPO, Reinforce++) with support for distributed training, multi-GPU setups, and comprehensive monitoring. It also supports complex multi-turn agent interactions during training and optimization techniques like prompt optimization. See the Agent-lightning documentation for details.

Note: This module is part of the consolidated agent-framework-lab package. Install the package with the lightning extra to use this module.

Installation

Install the agent-framework-lab package with Lightning dependencies:

pip install "agent-framework-lab[lightning]"

Optional Dependencies

# For math-related training
pip install -e ".[lightning,math]"

# For tau2 benchmarking
pip install -e ".[lightning,tau2]"

To prepare for RL training, you'll also need to install dependencies like PyTorch, Ray, and vLLM. See the Agent-lightning setup instructions for more details.

Usage Patterns

The basic usage pattern follows these steps:

Prepare your dataset as a list of samples (typically dictionaries)
Create an agent function that processes samples and returns evaluation scores
Decorate with @agentlightning.rollout to enable training
Configure and run training with the agentlightning.Trainer class

Example Implementation

from agent_framework.lab.lightning import AgentFrameworkTracer
from agentlightning import rollout, Trainer, LLM, Dataset
from agentlightning.algorithm.verl import VERL

TaskType = Any

@rollout
async def math_agent(task: TaskType, llm: LLM) -> float:
    """A function that solves a math problem and returns the evaluation score."""
    async with (
        MCPStdioTool(name="calculator", command="uvx", args=["mcp-server-calculator"]) as mcp_server,
        Agent(
            client=OpenAIChatClient(
                model_id=llm.model,
                api_key="your-api-key",
                base_url=llm.endpoint,
            ),
            name="MathAgent",
            instructions="Solve the math problem and output answer after ###",
            temperature=llm.sampling_parameters.get("temperature", 0.0),
        ) as agent,
    ):
        result = await agent.run(task["question"], tools=mcp_server)
        # Your evaluation logic here...
        return evaluation_score

# Training configuration
config = {
    "data": {"train_batch_size": 8},
    "trainer": {"total_epochs": 2, "n_gpus_per_node": 1},
    # ... additional config
}

# Initialize agent-framework tracer to send telemetry data to agent-lightning's observability backend
tracer = AgentFrameworkTracer()

trainer = Trainer(algorithm=VERL(config), tracer=tracer, n_workers=2)
# Both train_dataset and val_dataset are lists of TaskType
trainer.fit(math_agent, train_dataset, val_data=val_dataset)

Example 1: Training a Math Agent

This example trains an agent that uses an MCP calculator tool to solve math problems. The dataset is a small subset from the Calc-X dataset. The Agent-lightning team has also experimented with a similar agent using a larger dataset. See this example for more details.

Running this example requires a minimum of 40GB GPU memory. If you don't have enough GPU memory, you can use a smaller model like Qwen2.5-0.5B-Instruct, though the results won't be as good. To run the example:

cd samples
# Run the ray cluster (see the troubleshooting section for more details)
ray start --head --dashboard-host=0.0.0.0
# Run the training script
python train_math_agent.py

To debug the agent used in the example, you can run the script with the --debug flag:

python train_math_agent.py --debug

The training curve below shows results with Qwen2.5-1.5B-Instruct and GRPO. Validation accuracy increases from 10% to 35% in the first 8 steps, then begins to overfit.

Example 2: Training a Tau2 Agent

This advanced example demonstrates training on complex multi-agent scenarios using the Tau2 benchmark. It features a multi-agent setup with an assistant agent and a user simulator agent, training the assistant while keeping the user simulator fixed. The example incorporates a multi-step workflow with tool usage and complex evaluation metrics. Currently, training uses the airline domain with a 50/50 split between training and validation data.

Before running this example, please read the agent-lightning-lab-tau2 documentation and follow the setup instructions.

To run the example:

# Set required environment variables
export TAU2_DATA_DIR="/path/to/tau2/data"

# Used for user simulator and LLM judge
export OPENAI_BASE_URL="your-endpoint"
export OPENAI_API_KEY="your-key"

# Used for tracking on Weights & Biases
export WANDB_API_KEY="your-key"

# Run the ray cluster
ray start --head --dashboard-host=0.0.0.0

# Train the tau2 agent
cd samples
python samples/train_tau2_agent.py

# Debug mode
python samples/train_tau2_agent.py --debug

This example uses more advanced Agent-lightning features compared to the math example. It's based on the LitAgent class rather than the @rollout decorator and involves concepts like resources and agent filtering. We recommend reading the Agent-lightning documentation to learn more.

Results with Qwen2.5-1.5B-Instruct and GRPO are shown below. Validation accuracy improves from 28% to 40% over 8 epochs.

Troubleshooting

Ray Connection Issues

Agent-lightning uses VERL for RL training, which depends on Ray. To avoid issues, it's recommended to start Ray manually beforehand. If you encounter Ray startup problems:

# Stop existing Ray processes
ray stop

# Start Ray with debugging enabled
env RAY_DEBUG=legacy HYDRA_FULL_ERROR=1 VLLM_USE_V1=1 ray start --head --dashboard-host=0.0.0.0

Important: Run Ray commands in the same directory as your training script. Set any required environment variables (WANDB_API_KEY, HF_TOKEN) before starting Ray.

GPU Memory Issues

Reduce gpu_memory_utilization to <0.8

Enable FSDP offloading:

"fsdp_config": {
    "param_offload": True,
    "optimizer_offload": True,
}

Decrease batch sizes:
- train_batch_size
- ppo_mini_batch_size
- log_prob_micro_batch_size_per_gpu

Agent Debugging

Always test your agent before training:

# Use debug mode to validate agent behavior
python your_training_script.py --debug

# Check agent responses and evaluation logic
# Ensure proper tool integration and result extraction

Contributing

This package is part of the Microsoft Agent Framework Lab. Please see the main repository for contribution guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Agent Framework Lab - Lightning

Installation

Optional Dependencies

Usage Patterns

Example Implementation

Example 1: Training a Math Agent

Example 2: Training a Tau2 Agent

Troubleshooting

Ray Connection Issues

GPU Memory Issues

Agent Debugging

Contributing

License

FilesExpand file tree

lightning

Directory actions

More options

Directory actions

More options

Latest commit

History

lightning

Folders and files

parent directory

README.md

Agent Framework Lab - Lightning

Installation

Optional Dependencies

Usage Patterns

Example Implementation

Example 1: Training a Math Agent

Example 2: Training a Tau2 Agent

Troubleshooting

Ray Connection Issues

GPU Memory Issues

Agent Debugging

Contributing

License