Files
agent-framework/.github/workflows/python-integration-tests.yml
T
Giles Odigwe 3f23e1dfbf Python: Flaky test report (#5342)
* Add flaky test trend reporting to CI workflows

Parse JUnit XML (pytest.xml) from each integration test job and
aggregate results into a markdown trend report showing per-test
pass/fail/skip status across the last 5 runs.

Changes:
- Add python/scripts/flaky_report/ package (JUnit XML parser + trend
  report generator following the sample_validation pattern)
- Add upload-artifact steps to all 6 integration test jobs in both
  python-merge-tests.yml and python-integration-tests.yml
- Add python-flaky-test-report aggregation job with history caching
- Add --junitxml=pytest.xml to integration-tests.yml jobs (already
  present in merge-tests.yml)
- Fix Cosmos job --junitxml path (use absolute path since uv run
  --directory changes cwd)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix flaky report: handle missing test results gracefully

- Guard against missing reports directory in load_current_run()
- Only run report job when at least one integration test job completed
  (skip when all jobs are skipped, e.g. on pull_request events)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: fix provider names and if-expression precedence

- Use explicit provider name mapping in _derive_provider() so OpenAI
  renders correctly instead of 'Openai'
- Fix operator precedence in workflow if-expressions by wrapping
  success/failure checks in parentheses

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add File column and xfail detection to flaky test report

- Add File column showing module name (e.g., test_openai_chat_client)
  to disambiguate tests with the same function name across files
- Detect pytest xfail tests in JUnit XML (type=pytest.xfail) and
  show them with a distinct warning emoji instead of skip emoji
- Update legend to include xfail explanation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add Foundry embedding env vars to merge-tests workflow

Sync the Foundry integration job in python-merge-tests.yml with
python-integration-tests.yml by adding FOUNDRY_MODELS_ENDPOINT,
FOUNDRY_MODELS_API_KEY, FOUNDRY_EMBEDDING_MODEL, and
FOUNDRY_IMAGE_EMBEDDING_MODEL. Once the repo variables/secrets
are configured, the embedding integration test will run in CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix File column showing class name instead of module name

When a test is inside a class, pytest writes the classname as e.g.
'pkg.test_file.TestClass'. The previous rsplit logic extracted
'TestClass' instead of 'test_file'. Now detect uppercase-starting
segments as class names and use the preceding segment instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: UTC timestamps, XML error handling, summary fix, docstring

- Use datetime.now(timezone.utc) for accurate UTC timestamps
- Catch ET.ParseError per-file so corrupt XML doesn't crash the report
- Remove separate 'error' key from summary (errors folded into 'failed')
- Fix _short_name docstring to show actual dotted classname::name format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-22 20:16:50 +00:00

482 lines
17 KiB
YAML

#
# Dedicated Python integration tests workflow, called from the manual integration test orchestrator.
# Runs all tests (unit + integration) split into parallel jobs by provider.
#
# NOTE: This workflow and python-merge-tests.yml share the same set of parallel
# test jobs. Keep them in sync — when adding, removing, or modifying a job here,
# apply the same change to python-merge-tests.yml.
#
name: python-integration-tests
on:
workflow_call:
inputs:
checkout-ref:
description: "Git ref to checkout (e.g., refs/pull/123/head)"
required: true
type: string
permissions:
contents: read
id-token: write
env:
UV_CACHE_DIR: /tmp/.uv-cache
UV_PYTHON: "3.13"
jobs:
# Unit tests: all non-integration tests across all packages
python-tests-unit:
name: Python Integration Tests - Unit
runs-on: ubuntu-latest
environment: integration
timeout-minutes: 60
defaults:
run:
working-directory: python
steps:
- uses: actions/checkout@v6
with:
ref: ${{ inputs.checkout-ref }}
persist-credentials: false
- name: Set up python and install the project
id: python-setup
uses: ./.github/actions/python-setup
with:
python-version: ${{ env.UV_PYTHON }}
os: ${{ runner.os }}
- name: Test with pytest (unit tests only)
run: >
uv run poe test -A
-m "not integration"
--timeout=120 --session-timeout=900 --timeout_method thread
--retries 2 --retry-delay 5
# OpenAI integration tests
python-tests-openai:
name: Python Integration Tests - OpenAI
runs-on: ubuntu-latest
environment: integration
timeout-minutes: 60
env:
OPENAI_CHAT_COMPLETION_MODEL: ${{ vars.OPENAI__CHATMODELID }}
OPENAI_CHAT_MODEL: ${{ vars.OPENAI__RESPONSESMODELID }}
OPENAI_MODEL: ${{ vars.OPENAI__RESPONSESMODELID }}
OPENAI_EMBEDDING_MODEL: ${{ vars.OPENAI_EMBEDDING_MODEL_ID }}
OPENAI_API_KEY: ${{ secrets.OPENAI__APIKEY }}
defaults:
run:
working-directory: python
steps:
- uses: actions/checkout@v6
with:
ref: ${{ inputs.checkout-ref }}
persist-credentials: false
- name: Set up python and install the project
id: python-setup
uses: ./.github/actions/python-setup
with:
python-version: ${{ env.UV_PYTHON }}
os: ${{ runner.os }}
- name: Test with pytest (OpenAI integration)
run: >
uv run pytest --import-mode=importlib
packages/openai/tests
-m "integration and not azure"
-n logical --dist worksteal
--timeout=120 --session-timeout=900 --timeout_method thread
--retries 2 --retry-delay 5
--junitxml=pytest.xml
- name: Upload test results
if: always()
uses: actions/upload-artifact@v7
with:
name: test-results-openai
path: ./python/pytest.xml
if-no-files-found: ignore
# Azure OpenAI integration tests
python-tests-azure-openai:
name: Python Integration Tests - Azure OpenAI
runs-on: ubuntu-latest
environment: integration
timeout-minutes: 60
env:
AZURE_OPENAI_CHAT_COMPLETION_MODEL: ${{ vars.AZUREOPENAI__CHATDEPLOYMENTNAME }}
AZURE_OPENAI_CHAT_MODEL: ${{ vars.AZUREOPENAI__RESPONSESDEPLOYMENTNAME }}
AZURE_OPENAI_MODEL: ${{ vars.AZUREOPENAI__RESPONSESDEPLOYMENTNAME }}
AZURE_OPENAI_EMBEDDING_MODEL: ${{ vars.AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME }}
AZURE_OPENAI_ENDPOINT: ${{ vars.AZUREOPENAI__ENDPOINT }}
defaults:
run:
working-directory: python
steps:
- uses: actions/checkout@v6
with:
ref: ${{ inputs.checkout-ref }}
persist-credentials: false
- name: Set up python and install the project
id: python-setup
uses: ./.github/actions/python-setup
with:
python-version: ${{ env.UV_PYTHON }}
os: ${{ runner.os }}
- name: Azure CLI Login
uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Test with pytest (Azure OpenAI integration)
run: >
uv run pytest --import-mode=importlib
packages/openai/tests/openai/test_openai_chat_completion_client_azure.py
packages/openai/tests/openai/test_openai_chat_client_azure.py
packages/openai/tests/openai/test_openai_embedding_client_azure.py
-m integration
-n logical --dist worksteal
--timeout=120 --session-timeout=900 --timeout_method thread
--retries 2 --retry-delay 5
--junitxml=pytest.xml
- name: Upload test results
if: always()
uses: actions/upload-artifact@v7
with:
name: test-results-azure-openai
path: ./python/pytest.xml
if-no-files-found: ignore
# Misc integration tests (Anthropic, Hyperlight, Ollama, MCP)
python-tests-misc-integration:
name: Python Integration Tests - Misc
runs-on: ubuntu-latest
environment: integration
timeout-minutes: 60
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
ANTHROPIC_CHAT_MODEL: ${{ vars.ANTHROPIC_CHAT_MODEL_ID }}
LOCAL_MCP_URL: ${{ vars.LOCAL_MCP__URL }}
defaults:
run:
working-directory: python
steps:
- uses: actions/checkout@v6
with:
ref: ${{ inputs.checkout-ref }}
persist-credentials: false
- name: Set up python and install the project
id: python-setup
uses: ./.github/actions/python-setup
with:
python-version: ${{ env.UV_PYTHON }}
os: ${{ runner.os }}
- name: Start local MCP server
id: local-mcp
uses: ./.github/actions/setup-local-mcp-server
with:
fallback_url: ${{ env.LOCAL_MCP_URL }}
- name: Prefer local MCP URL when available
run: echo "LOCAL_MCP_URL=${{ steps.local-mcp.outputs.effective_url }}" >> "$GITHUB_ENV"
- name: Test with pytest (Anthropic, Hyperlight, Ollama, MCP integration)
run: >
uv run pytest --import-mode=importlib
packages/anthropic/tests
packages/hyperlight/tests
packages/ollama/tests
packages/core/tests/core/test_mcp.py
-m integration
-n logical --dist worksteal
--timeout=120 --session-timeout=900 --timeout_method thread
--retries 2 --retry-delay 30
--junitxml=pytest.xml
- name: Upload test results
if: always()
uses: actions/upload-artifact@v7
with:
name: test-results-misc
path: ./python/pytest.xml
if-no-files-found: ignore
- name: Stop local MCP server
if: always()
shell: bash
run: |
set -euo pipefail
server_pid="${{ steps.local-mcp.outputs.pid }}"
if [[ -z "$server_pid" ]]; then
exit 0
fi
if ! kill -0 "$server_pid" 2>/dev/null; then
exit 0
fi
kill -TERM -- "-$server_pid" 2>/dev/null || kill -TERM "$server_pid" 2>/dev/null || true
for _ in $(seq 1 10); do
if ! kill -0 "$server_pid" 2>/dev/null; then
exit 0
fi
sleep 1
done
kill -KILL -- "-$server_pid" 2>/dev/null || kill -KILL "$server_pid" 2>/dev/null || true
# Azure Functions + Durable Task integration tests
python-tests-functions:
name: Python Integration Tests - Functions
runs-on: ubuntu-latest
environment: integration
timeout-minutes: 60
env:
UV_PYTHON: "3.11"
OPENAI_CHAT_COMPLETION_MODEL: ${{ vars.OPENAI__CHATMODELID }}
OPENAI_CHAT_MODEL: ${{ vars.OPENAI__RESPONSESMODELID }}
OPENAI_MODEL: ${{ vars.OPENAI__RESPONSESMODELID }}
OPENAI_EMBEDDING_MODEL: ${{ vars.OPENAI_EMBEDDING_MODEL_ID }}
OPENAI_API_KEY: ${{ secrets.OPENAI__APIKEY }}
AZURE_OPENAI_ENDPOINT: ${{ vars.AZUREOPENAI__ENDPOINT }}
AZURE_OPENAI_MODEL: ${{ vars.AZUREOPENAI__RESPONSESDEPLOYMENTNAME }}
AZURE_OPENAI_CHAT_MODEL: ${{ vars.AZUREOPENAI__RESPONSESDEPLOYMENTNAME }}
AZURE_OPENAI_CHAT_COMPLETION_MODEL: ${{ vars.AZUREOPENAI__CHATDEPLOYMENTNAME }}
FOUNDRY_PROJECT_ENDPOINT: ${{ vars.FOUNDRY_PROJECT_ENDPOINT }}
FOUNDRY_MODEL: ${{ vars.FOUNDRY_MODEL }}
FUNCTIONS_WORKER_RUNTIME: "python"
DURABLE_TASK_SCHEDULER_CONNECTION_STRING: "Endpoint=http://localhost:8080;TaskHub=default;Authentication=None"
AzureWebJobsStorage: "UseDevelopmentStorage=true"
defaults:
run:
working-directory: python
steps:
- uses: actions/checkout@v6
with:
ref: ${{ inputs.checkout-ref }}
persist-credentials: false
- name: Set up python and install the project
id: python-setup
uses: ./.github/actions/python-setup
with:
python-version: ${{ env.UV_PYTHON }}
os: ${{ runner.os }}
- name: Azure CLI Login
uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Set up Azure Functions Integration Test Emulators
uses: ./.github/actions/azure-functions-integration-setup
id: azure-functions-setup
- name: Test with pytest (Functions + Durable Task integration)
run: >
uv run pytest --import-mode=importlib
packages/azurefunctions/tests/integration_tests
packages/durabletask/tests/integration_tests
-m integration
-n logical --dist worksteal
-x
--timeout=360 --session-timeout=900 --timeout_method thread
--retries 2 --retry-delay 5
--junitxml=pytest.xml
- name: Upload test results
if: always()
uses: actions/upload-artifact@v7
with:
name: test-results-functions
path: ./python/pytest.xml
if-no-files-found: ignore
# Foundry integration tests
python-tests-foundry:
name: Python Integration Tests - Foundry
runs-on: ubuntu-latest
environment: integration
timeout-minutes: 60
env:
FOUNDRY_PROJECT_ENDPOINT: ${{ vars.FOUNDRY_PROJECT_ENDPOINT }}
FOUNDRY_MODEL: ${{ vars.FOUNDRY_MODEL }}
FOUNDRY_AGENT_NAME: ${{ vars.FOUNDRY_AGENT_NAME }}
FOUNDRY_AGENT_VERSION: ${{ vars.FOUNDRY_AGENT_VERSION }}
FOUNDRY_MODELS_ENDPOINT: ${{ vars.FOUNDRY_MODELS_ENDPOINT || '' }}
FOUNDRY_MODELS_API_KEY: ${{ secrets.FOUNDRY_MODELS_API_KEY || '' }}
FOUNDRY_EMBEDDING_MODEL: ${{ vars.FOUNDRY_EMBEDDING_MODEL || '' }}
FOUNDRY_IMAGE_EMBEDDING_MODEL: ${{ vars.FOUNDRY_IMAGE_EMBEDDING_MODEL || '' }}
LOCAL_MCP_URL: ${{ vars.LOCAL_MCP__URL }}
defaults:
run:
working-directory: python
steps:
- uses: actions/checkout@v6
with:
ref: ${{ inputs.checkout-ref }}
persist-credentials: false
- name: Set up python and install the project
id: python-setup
uses: ./.github/actions/python-setup
with:
python-version: ${{ env.UV_PYTHON }}
os: ${{ runner.os }}
- name: Azure CLI Login
uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Test with pytest
timeout-minutes: 15
run: >
uv run pytest --import-mode=importlib
packages/foundry/tests
-m integration
-n logical --dist worksteal
--timeout=120 --session-timeout=900 --timeout_method thread
--retries 2 --retry-delay 5
--junitxml=pytest.xml
- name: Upload test results
if: always()
uses: actions/upload-artifact@v7
with:
name: test-results-foundry
path: ./python/pytest.xml
if-no-files-found: ignore
# Azure Cosmos integration tests
python-tests-cosmos:
name: Python Integration Tests - Cosmos
runs-on: ubuntu-latest
environment: integration
timeout-minutes: 60
services:
cosmosdb:
image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview
ports:
- 8081:8081
env:
AZURE_COSMOS_ENDPOINT: "http://localhost:8081/"
# Static Azure Cosmos DB emulator key (documented): https://learn.microsoft.com/en-us/azure/cosmos-db/emulator
AZURE_COSMOS_KEY: "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw=="
AZURE_COSMOS_DATABASE_NAME: "agent-framework-cosmos-it-db"
AZURE_COSMOS_CONTAINER_NAME: "agent-framework-cosmos-it-container"
defaults:
run:
working-directory: python
steps:
- uses: actions/checkout@v6
with:
ref: ${{ inputs.checkout-ref }}
persist-credentials: false
- name: Set up python and install the project
id: python-setup
uses: ./.github/actions/python-setup
with:
python-version: ${{ env.UV_PYTHON }}
os: ${{ runner.os }}
- name: Wait for Cosmos DB emulator
run: |
for i in {1..60}; do
if curl --silent --show-error http://localhost:8081/ > /dev/null; then
echo "Cosmos DB emulator is ready."
exit 0
fi
sleep 2
done
echo "Cosmos DB emulator did not become ready in time." >&2
exit 1
- name: Test with pytest (Cosmos integration)
run: uv run --directory packages/azure-cosmos poe integration-tests -n logical --dist worksteal --timeout=120 --session-timeout=900 --timeout_method thread --retries 2 --retry-delay 5 --junitxml=${{ github.workspace }}/python/pytest.xml
- name: Upload test results
if: always()
uses: actions/upload-artifact@v7
with:
name: test-results-cosmos
path: ./python/pytest.xml
if-no-files-found: ignore
# Flaky test trend report (aggregates per-job JUnit XML results)
python-flaky-test-report:
name: Flaky Test Report
if: >
always() &&
(contains(join(needs.*.result, ','), 'success') ||
contains(join(needs.*.result, ','), 'failure'))
needs:
[
python-tests-openai,
python-tests-azure-openai,
python-tests-misc-integration,
python-tests-functions,
python-tests-foundry,
python-tests-cosmos,
]
runs-on: ubuntu-latest
defaults:
run:
working-directory: python
steps:
- uses: actions/checkout@v6
with:
ref: ${{ inputs.checkout-ref }}
persist-credentials: false
- name: Set up python and install the project
uses: ./.github/actions/python-setup
with:
python-version: ${{ env.UV_PYTHON }}
os: ${{ runner.os }}
- name: Download all test results from current run
uses: actions/download-artifact@v4
with:
pattern: test-results-*
path: test-results/
- name: Restore flaky report history cache
uses: actions/cache/restore@v4
with:
path: python/flaky-report-history.json
key: flaky-report-history-integration-${{ github.run_id }}
restore-keys: |
flaky-report-history-integration-
- name: Generate trend report
run: >
uv run python scripts/flaky_report/aggregate.py
../test-results/
flaky-report-history.json
flaky-test-report.md
- name: Post to Job Summary
if: always()
run: cat flaky-test-report.md >> $GITHUB_STEP_SUMMARY
- name: Save flaky report history cache
if: always()
uses: actions/cache/save@v4
with:
path: python/flaky-report-history.json
key: flaky-report-history-integration-${{ github.run_id }}
- name: Upload unified trend report
if: always()
uses: actions/upload-artifact@v7
with:
name: flaky-test-report
path: |
python/flaky-test-report.md
python/flaky-report-history.json
python-integration-tests-check:
if: always()
runs-on: ubuntu-latest
needs:
[
python-tests-unit,
python-tests-openai,
python-tests-azure-openai,
python-tests-misc-integration,
python-tests-functions,
python-tests-foundry,
python-tests-cosmos
]
steps:
- name: Fail workflow if tests failed
if: contains(join(needs.*.result, ','), 'failure')
uses: actions/github-script@v8
with:
script: core.setFailed('Integration Tests Failed!')
- name: Fail workflow if tests cancelled
if: contains(join(needs.*.result, ','), 'cancelled')
uses: actions/github-script@v8
with:
script: core.setFailed('Integration Tests Cancelled!')