mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
1543370027
* Lab: Updates to GAIA module * update * emoj! * fix lint * update lab test workflow to only trigger for python changes * lint * lint * Fix broken OpenAI agents JS documentation link
53 lines
1.4 KiB
Markdown
53 lines
1.4 KiB
Markdown
# Agent Framework Lab - GAIA
|
|
|
|
The GAIA benchmark can be used for evaluating agents and workflows built using the Agent Framework.
|
|
It includes built-in benchmarks as well as utilities for running custom evaluations.
|
|
|
|
> **Note**: This module is part of the consolidated `agent-framework-lab` package. Install the package with the `gaia` extra to use this module.
|
|
|
|
## Setup
|
|
|
|
Install the `agent-framework-lab` package with GAIA dependencies:
|
|
|
|
```bash
|
|
pip install "agent-framework-lab[gaia]"
|
|
```
|
|
|
|
Set up Hugging Face token:
|
|
|
|
```bash
|
|
export HF_TOKEN="hf\*..." # must have access to gaia-benchmark/GAIA
|
|
```
|
|
|
|
## Create an evaluation script
|
|
|
|
Create a Python script (e.g., `run_gaia.py`) with the following content:
|
|
|
|
```python
|
|
from agent_framework.lab.gaia import GAIA, Task, Prediction, GAIATelemetryConfig
|
|
|
|
async def run_task(task: Task) -> Prediction:
|
|
return Prediction(prediction="answer here", messages=[])
|
|
|
|
async def main() -> None:
|
|
# Optional: Enable telemetry for detailed tracing
|
|
telemetry_config = GAIATelemetryConfig(
|
|
enable_tracing=True,
|
|
trace_to_file=True,
|
|
file_path="gaia_traces.jsonl"
|
|
)
|
|
|
|
runner = GAIA(telemetry_config=telemetry_config)
|
|
await runner.run(run_task, level=1, max_n=5, parallel=2)
|
|
```
|
|
|
|
See the [gaia_sample.py](./samples/gaia_sample.py) for more detail.
|
|
|
|
## View results
|
|
|
|
We provide a console viewer for reading GAIA results:
|
|
|
|
```bash
|
|
uv run gaia_viewer "gaia_results_<timestamp>.jsonl" --detailed
|
|
```
|