Files
agent-framework/python/packages/lab/gaia
T
Eduard van Valkenburg 2576e7a091 Python: Telemetry and observability follow-up (#833)
* updated telemetry work

* updated telemetry

* slight improvement

* updated tests

* fixes for telemetry

* fixes for mypy

* added settings setup to runner to avoid error

* streamline usage

* updated tests

* updated tests

* further refinement

* fix dumped item for otel

* removed enable_workflow_otel

* final fixes

* final fixes

* updated samples

* removed exporters

* fix tests

* fixed last import'

* fixed devui
2576e7a091 ยท 2025-09-23 06:21:56 +00:00
History
..

Agent Framework Lab - GAIA

The GAIA benchmark can be used for evaluating agents and workflows built using the Agent Framework. It includes built-in benchmarks as well as utilities for running custom evaluations.

Setup

Use uv to install the package with GAIA dependencies:

uv pip install "agent-framework-lab-gaia"

Set up Hugging Face token:

export HF_TOKEN="hf\*..." # must have access to gaia-benchmark/GAIA

Create an evaluation script

Create a Python script (e.g., run_gaia.py) with the following content:

from agent_framework.lab.gaia import GAIA, Task, Prediction, GAIATelemetryConfig

async def run_task(task: Task) -> Prediction:
    return Prediction(prediction="answer here", messages=[])

async def main() -> None:
    # Optional: Enable telemetry for detailed tracing
    telemetry_config = GAIATelemetryConfig(
        enable_tracing=True,
        trace_to_file=True,
        file_path="gaia_traces.jsonl"
    )

    runner = GAIA(telemetry_config=telemetry_config)
    await runner.run(run_task, level=1, max_n=5, parallel=2)

See the gaia_sample.py for more detail.

Run the evaluation

Run the evaluation script using uv:

uv run python run_gaia.py

By default, the script will first look for cached GAIA data in the data_gaia_hub directory, and download it if not found. The result will be saved to gaia_results_<timestamp>.jsonl.

Don't run the script inside this directory because it will confuse the local agent_framework namespace package with the real one.

View results

We provide a console viewer for reading GAIA results:

uv run gaia_viewer "gaia_results_<timestamp>.jsonl" --detailed