diff --git a/python/packages/lab/README.md b/python/packages/lab/README.md index 31ce30f169..7356673f7e 100644 --- a/python/packages/lab/README.md +++ b/python/packages/lab/README.md @@ -44,10 +44,12 @@ This structure maintains a single PyPI package `agent-framework-lab` while suppo ## Installation -Install the base lab package: +Install from source: ```bash -pip install agent-framework-lab +git clone https://github.com/microsoft/agent-framework.git +cd agent-framework/python/packages/lab +pip install -e . ``` For details on installing individual modules, see their respective README files listed above. diff --git a/python/packages/lab/gaia/README.md b/python/packages/lab/gaia/README.md index f8734e65f9..948df98372 100644 --- a/python/packages/lab/gaia/README.md +++ b/python/packages/lab/gaia/README.md @@ -7,10 +7,12 @@ It includes built-in benchmarks as well as utilities for running custom evaluati ## Setup -Install the agent-framework-lab package with GAIA dependencies: +Install from source with GAIA dependencies: ```bash -pip install "agent-framework-lab[gaia]" +git clone https://github.com/microsoft/agent-framework.git +cd agent-framework/python/packages/lab +pip install -e ".[gaia]" ``` Set up Hugging Face token: diff --git a/python/packages/lab/lightning/README.md b/python/packages/lab/lightning/README.md index 71cb69ecbd..1ab8a9ff50 100644 --- a/python/packages/lab/lightning/README.md +++ b/python/packages/lab/lightning/README.md @@ -8,20 +8,22 @@ This package enables you to train and fine-tune agents using advanced RL algorit ## Installation -Install the agent-framework-lab package with Lightning dependencies: +Install from source with Lightning dependencies: ```bash -pip install "agent-framework-lab[lightning]" +git clone https://github.com/microsoft/agent-framework.git +cd agent-framework/python/packages/lab +pip install -e ".[lightning]" ``` ### Optional Dependencies ```bash # For math-related training -pip install agent-framework-lab[lightning,math] +pip install -e ".[lightning,math]" # For tau2 benchmarking -pip install agent-framework-lab[lightning,tau2] +pip install -e ".[lightning,tau2]" ``` To prepare for RL training, you'll also need to install dependencies like PyTorch, Ray, and vLLM. See the [Agent-lightning setup instructions](https://github.com/microsoft/agent-lightning) for more details. diff --git a/python/packages/lab/tau2/README.md b/python/packages/lab/tau2/README.md index d98e45ade1..b94be70f5f 100644 --- a/python/packages/lab/tau2/README.md +++ b/python/packages/lab/tau2/README.md @@ -13,20 +13,22 @@ Each evaluation runs a multi-turn conversation where the user simulator presents ## Supported Domains -| Domain | Status | Description | -|--------|--------|-------------| -| **airline** | ✅ Supported | Customer service for airline booking, changes, and support | -| **retail** | 🚧 In Development | E-commerce customer support scenarios | -| **telecom** | 🚧 In Development | Telecommunications service support | +| Domain | Status | Description | +| ----------- | ----------------- | ---------------------------------------------------------- | +| **airline** | ✅ Supported | Customer service for airline booking, changes, and support | +| **retail** | 🚧 In Development | E-commerce customer support scenarios | +| **telecom** | 🚧 In Development | Telecommunications service support | -*Note: Currently only the airline domain is fully supported.* +_Note: Currently only the airline domain is fully supported._ ## Installation -Install the agent-framework-lab package with TAU2 dependencies: +Install from source with TAU2 dependencies: ```bash -pip install "agent-framework-lab[tau2]" +git clone https://github.com/microsoft/agent-framework.git +cd agent-framework/python/packages/lab +pip install -e ".[tau2]" ``` Download data from [Tau2-Bench](https://github.com/sierra-research/tau2-bench): @@ -104,15 +106,15 @@ python samples/run_benchmark.py --max-steps 20 The following results are reproduced from our implementation of τ²-bench with `samples/run_benchmark.py`. It shows the average success rate over the dataset of 50 tasks. -| Agent Model | User Model | Success Rate | -|-------------|------------|----------| -| gpt-5 | gpt-4.1 | 62.0% | -| gpt-5-mini | gpt-4.1 | 52.0% | -| gpt-4.1 | gpt-4.1 | 60.0% | -| gpt-4.1-mini | gpt-4.1 | 50.0% | -| gpt-4.1 | gpt-4o-mini | 42.0% | -| gpt-4o | gpt-4.1 | 42.0% | -| gpt-4o-mini | gpt-4.1 | 26.0% | +| Agent Model | User Model | Success Rate | +| ------------ | ----------- | ------------ | +| gpt-5 | gpt-4.1 | 62.0% | +| gpt-5-mini | gpt-4.1 | 52.0% | +| gpt-4.1 | gpt-4.1 | 60.0% | +| gpt-4.1-mini | gpt-4.1 | 50.0% | +| gpt-4.1 | gpt-4o-mini | 42.0% | +| gpt-4o | gpt-4.1 | 42.0% | +| gpt-4o-mini | gpt-4.1 | 26.0% | ## Advanced Usage