Files
Eduard van Valkenburg 50fdcbaf57 Python: chore(python): improve dependency range automation (#4343)
* chore(python): improve dependency range automation

- tighten dependency bounds and coding standards guidance\n- add dependency range validation workflow, reporting, and issue automation\n- update related tests and dependency pins for compatibility

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updated text and pyarrow

* new lock

* fixed workflow

* updated deps

* fix tiktoken

* chore(python): refine dependency validation workflows

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(python): add high-level dependency validation comments

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* WIP

* added additional comments and excludes

* added dev dependency handling and workflow and updates to package ranges

* added readme and simplified commands

* fix markers

* chore(python): address dependency review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tighten dependency bounds, remove stale overrides, restore Python 3.10 support

- Apply dependency bound policy across all packages: stable >=1.0 deps use
  >=floor,<next_major; pre-1.0/prerelease deps use validated hard-bounded ranges
- Remove stale root tool.uv.override-dependencies (uvicorn, websockets, grpcio)
- Lower github_copilot requires-python to >=3.10 with github-copilot-sdk gated
  behind python_version >= 3.11 marker; import raises ImportError on 3.10
- Skip github_copilot pyright/mypy/test tasks on Python <3.11
- Use version-conditional pyrightconfig for samples on Python 3.10
- Add compatibility fix in core responses client for older openai typed dicts
- Normalize uv.lock prerelease mode and refresh dev dependencies
- Update CODING_STANDARD.md, DEV_SETUP.md, and package management skill docs

Closes #902

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* small tweaks

* add note in workflow

* fix workflows and several versions

* fix duplicate

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
50fdcbaf57 ยท 2026-03-13 12:32:37 +00:00
History
..

Agent Framework Lab - GAIA

The GAIA benchmark can be used for evaluating agents and workflows built using the Agent Framework. It includes built-in benchmarks as well as utilities for running custom evaluations.

Note

: This module is part of the consolidated agent-framework-lab package. Install the package with the gaia extra to use this module.

Setup

Install the agent-framework-lab package with GAIA dependencies:

pip install "agent-framework-lab[gaia]"

Set up Hugging Face token:

export HF_TOKEN="hf\*..." # must have access to gaia-benchmark/GAIA

Create an evaluation script

Create a Python script (e.g., run_gaia.py) with the following content:

from agent_framework.lab.gaia import GAIA, Task, Prediction, GAIATelemetryConfig

async def run_task(task: Task) -> Prediction:
    return Prediction(prediction="answer here", messages=[])

async def main() -> None:
    # Optional: Enable telemetry for detailed tracing
    telemetry_config = GAIATelemetryConfig(
        enable_tracing=True,
        trace_to_file=True,
        file_path="gaia_traces.jsonl"
    )

    runner = GAIA(telemetry_config=telemetry_config)
    await runner.run(run_task, level=1, max_n=5, parallel=2)

See the gaia_sample.py for more detail.

View results

We provide a console viewer for reading GAIA results:

uv run gaia_viewer "gaia_results_<timestamp>.jsonl" --detailed