Files
agent-framework/python/packages/lab/gaia
T
Giles Odigwe 7a88af0aef Python: [BREAKING] Replace Hosted*Tool classes with tool methods (#3634)
* Replace Hosted*Tool classes with client static factory methods

* fixed failing test

* mypy fix

* mypy fix 2

* declarative mypy fix

* addressed comments

* ToolProtocol removal

* fixed test

* agents mypy fix

* fix failing tests

* mypy fix

* addressed comments

* fixed tests

* addressed comments + added factory method overrides for azureai v2 client

* mypy fix

* added kwargs to azureai tool methods

* fixed in test

* _sessions fix

* test fix
7a88af0aef ยท 2026-02-11 00:04:27 +00:00
History
..

Agent Framework Lab - GAIA

The GAIA benchmark can be used for evaluating agents and workflows built using the Agent Framework. It includes built-in benchmarks as well as utilities for running custom evaluations.

Note

: This module is part of the consolidated agent-framework-lab package. Install the package with the gaia extra to use this module.

Setup

Install the agent-framework-lab package with GAIA dependencies:

pip install "agent-framework-lab[gaia]"

Set up Hugging Face token:

export HF_TOKEN="hf\*..." # must have access to gaia-benchmark/GAIA

Create an evaluation script

Create a Python script (e.g., run_gaia.py) with the following content:

from agent_framework.lab.gaia import GAIA, Task, Prediction, GAIATelemetryConfig

async def run_task(task: Task) -> Prediction:
    return Prediction(prediction="answer here", messages=[])

async def main() -> None:
    # Optional: Enable telemetry for detailed tracing
    telemetry_config = GAIATelemetryConfig(
        enable_tracing=True,
        trace_to_file=True,
        file_path="gaia_traces.jsonl"
    )

    runner = GAIA(telemetry_config=telemetry_config)
    await runner.run(run_task, level=1, max_n=5, parallel=2)

See the gaia_sample.py for more detail.

View results

We provide a console viewer for reading GAIA results:

uv run gaia_viewer "gaia_results_<timestamp>.jsonl" --detailed