mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
3139347526
* fixes Python: Add env_file_path parameter to setup_observability() similar to AzureOpenAIChatClient Fixes #2186 * WIP on updates using configure_azure_monitor * improved setup and clarity * fixed root .env.example * revert changes * updated files * updated sample * updated zero code * test fixes and fixed links * fix devui * removed planning docs * added enable method and updated readme and samples * clarified docstring * add return annotation * updated naming * update capatilized version * updated readme and some fixes * updated decorator name inline with the rest * feedback from comments addressed
3139347526
ยท
2025-12-16 06:56:30 +00:00
History
Agent Framework Lab - GAIA
The GAIA benchmark can be used for evaluating agents and workflows built using the Agent Framework. It includes built-in benchmarks as well as utilities for running custom evaluations.
Note
: This module is part of the consolidated
agent-framework-labpackage. Install the package with thegaiaextra to use this module.
Setup
Install the agent-framework-lab package with GAIA dependencies:
pip install "agent-framework-lab[gaia]"
Set up Hugging Face token:
export HF_TOKEN="hf\*..." # must have access to gaia-benchmark/GAIA
Create an evaluation script
Create a Python script (e.g., run_gaia.py) with the following content:
from agent_framework.lab.gaia import GAIA, Task, Prediction, GAIATelemetryConfig
async def run_task(task: Task) -> Prediction:
return Prediction(prediction="answer here", messages=[])
async def main() -> None:
# Optional: Enable telemetry for detailed tracing
telemetry_config = GAIATelemetryConfig(
enable_tracing=True,
trace_to_file=True,
file_path="gaia_traces.jsonl"
)
runner = GAIA(telemetry_config=telemetry_config)
await runner.run(run_task, level=1, max_n=5, parallel=2)
See the gaia_sample.py for more detail.
View results
We provide a console viewer for reading GAIA results:
uv run gaia_viewer "gaia_results_<timestamp>.jsonl" --detailed