Files
Giles Odigwe 540193ccef Python: Reduce flaky integration tests and improve CI signal quality (#5454)
* Enable Ollama integration tests in CI and rename report to Integration Test Report

- Install Ollama, cache models (qwen2.5:0.5b + nomic-embed-text), and start
  server in the Misc integration job for both workflow files
- Set OLLAMA_MODEL and OLLAMA_EMBEDDING_MODEL env vars so the 5 Ollama tests
  are no longer skipped
- Rename Flaky Test Report to Integration Test Report throughout (job names,
  artifact names, cache keys, file names, script titles/docstrings)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Bump Ollama model to qwen2.5:1.5b for better instruction following

The 0.5b model was too small to reliably follow simple prompts like
'Say Hello World', causing test assertion failures. The 1.5b model
follows instructions more reliably while still being small enough
for fast CI pulls (~1GB).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable reliable streaming integration tests

Remove the hard skip on test_03_reliable_streaming tests that was
temporarily disabled for instability investigation. CI infrastructure
(Azurite, DTS emulator, Redis, func CLI) is already in place.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable skipped Functions/DurableTask tests and bump timeout to 480s

- Remove hard skips from 4 tests in test_11_workflow_parallel.py
- Remove hard skip from test_conditional_branching in test_06_dt_multi_agent_orchestration_conditionals.py
- Increase pytest --timeout from 360 to 480 for Functions+DurableTask CI job
- Updated in both python-merge-tests.yml and python-integration-tests.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-skip failing Functions/DurableTask tests with specific root causes

- test_11_workflow_parallel (4 tests): xdist worker crashes during execution
- test_conditional_branching: orchestration fails with RuntimeError, not a timeout
- Keep 480s timeout bump for remaining Functions tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix auth routing in samples 06/11: api_key -> credential for Azure OpenAI

Both samples passed a bearer token provider via api_key= which caused the
client to route to api.openai.com instead of Azure OpenAI, resulting in
401 Unauthorized. Changed to credential= which correctly triggers Azure
routing and picks up AZURE_OPENAI_ENDPOINT from the environment.

- samples/azure_functions/11_workflow_parallel/function_app.py: 1 fix
- samples/durabletask/06_multi_agent_orchestration_conditionals/worker.py: 2 fixes
- Re-enable 4 parallel workflow tests and 1 conditional branching test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-skip parallel workflow tests: xdist worker distribution issue

The 4 parallel workflow tests crash because xdist worksteal distributes
them across separate workers, each spawning its own func process against
shared emulators. Auth fix (api_key->credential) was valid and stays.
test_conditional_branching now passes with the auth fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix E501 line-too-long in azurefunctions parallel test skip reasons

Wrap skip reason strings to stay within 120 char line limit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add retry logic and port-conflict fix for Ollama CI setup

- Kill any auto-started Ollama before launching serve (fixes port
  conflict: 'address already in use')
- Retry ollama pull up to 3 times with 15s backoff (fixes 429 rate
  limit failures)
- Applied to both python-merge-tests.yml and python-integration-tests.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix flaky integration tests and re-enable skipped tests

- Foundry agent: add allow_preview=True to custom client test
- Foundry hosting: raise max_output_tokens 50->200, add temperature,
  relax assertion in test_temperature_and_max_tokens
- Foundry embedding: update skip reason with root cause (endpoint mismatch)
- OpenAI file search: fix vector store indexing race condition by polling
  file_counts before querying; fix get_streaming_response -> get_response(stream=True)
- Azure OpenAI file search: remove skip (transient 500 resolved)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove temperature from foundry hosting test (unsupported by CI model)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Stabilize Ollama tool call integration tests with no-arg function

Use a no-argument greet() function instead of hello_world(arg1) for
integration tests. The 1.5B model in CI is unreliable at generating
correct tool call arguments, causing 'Argument parsing failed' errors.
A no-arg function eliminates this flakiness entirely.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Increase reliable streaming test timeouts from 30s to 60s

The LLM call through Azure OpenAI + Redis streaming pipeline can exceed
30s in CI due to cold starts or throttling. Raise to 60s to reduce
flaky timeouts while still bounded by pytest's 120s per-test limit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable workflow parallel tests with xdist_group marker

The tests were skipped because xdist distributes module tests across
workers, each spawning their own func process (port conflicts). Adding
xdist_group forces all tests in this module onto a single worker so
the module-scoped function_app_for_test fixture works correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert "Re-enable workflow parallel tests with xdist_group marker"

This reverts commit 455c28da62.

* Rename flaky_report to integration_test_report and add try/finally cleanup

- Rename scripts/flaky_report/ to scripts/integration_test_report/ to
  reflect expanded scope beyond flaky-test detection
- Update workflow references in both CI files
- Wrap file search integration tests in try/finally to ensure vector
  store cleanup runs even on test failure or timeout

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Ollama pull failure propagation and Azure OpenAI vector store readiness

- Ollama CI: fail the step immediately if model pull fails after 3
  retries instead of silently proceeding to tests
- Azure OpenAI file search: add the same vector-store readiness polling
  that was applied to the non-Azure OpenAI tests, preventing eventual
  consistency race conditions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* remove load_dotenv from test file

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 00:41:39 +00:00

577 lines
19 KiB
Python

# Copyright (c) Microsoft. All rights reserved.
import os
from collections.abc import AsyncIterable
from typing import Any
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from agent_framework import (
BaseChatClient,
ChatResponseUpdate,
Content,
Message,
chat_middleware,
tool,
)
from agent_framework.exceptions import ChatClientException, ChatClientInvalidRequestException, SettingNotFoundError
from ollama import AsyncClient
from ollama._types import ChatResponse as OllamaChatResponse
from ollama._types import Message as OllamaMessage
from openai import AsyncStream
from pytest import fixture
from agent_framework_ollama import OllamaChatClient
# region Service Setup
skip_if_azure_integration_tests_disabled = pytest.mark.skipif(
os.getenv("OLLAMA_MODEL", "") in ("", "test-model"),
reason="No real Ollama chat model provided; skipping integration tests.",
)
# region: Connector Settings fixtures
@fixture
def exclude_list(request: Any) -> list[str]:
"""Fixture that returns a list of environment variables to exclude."""
return request.param if hasattr(request, "param") else []
@fixture
def override_env_param_dict(request: Any) -> dict[str, str]:
"""Fixture that returns a dict of environment variables to override."""
return request.param if hasattr(request, "param") else {}
# These two fixtures are used for multiple things, also non-connector tests
@fixture()
def ollama_unit_test_env(monkeypatch, exclude_list, override_env_param_dict): # type: ignore
"""Fixture to set environment variables for OllamaSettings."""
if exclude_list is None:
exclude_list = []
if override_env_param_dict is None:
override_env_param_dict = {}
env_vars = {"OLLAMA_HOST": "http://localhost:12345", "OLLAMA_MODEL": "test"}
env_vars.update(override_env_param_dict) # type: ignore
for key, value in env_vars.items():
if key in exclude_list:
monkeypatch.delenv(key, raising=False) # type: ignore
continue
monkeypatch.setenv(key, value) # type: ignore
return env_vars
@fixture
def chat_history() -> list[Message]:
return []
@fixture
def mock_streaming_chat_completion_response() -> AsyncStream[OllamaChatResponse]:
response = OllamaChatResponse(
message=OllamaMessage(content="test", role="assistant"),
model="test",
)
stream = MagicMock(spec=AsyncStream)
stream.__aiter__.return_value = [response]
return stream
@fixture
def mock_streaming_chat_completion_response_reasoning() -> AsyncStream[OllamaChatResponse]:
response = OllamaChatResponse(
message=OllamaMessage(thinking="test", role="assistant"),
model="test",
)
stream = MagicMock(spec=AsyncStream)
stream.__aiter__.return_value = [response]
return stream
@fixture
def mock_chat_completion_response() -> OllamaChatResponse:
return OllamaChatResponse(
message=OllamaMessage(content="test", role="assistant"),
model="test",
eval_count=1,
prompt_eval_count=1,
created_at="2024-01-01T00:00:00Z",
)
@fixture
def mock_chat_completion_response_reasoning() -> OllamaChatResponse:
return OllamaChatResponse(
message=OllamaMessage(thinking="test", role="assistant"),
model="test",
eval_count=1,
prompt_eval_count=1,
created_at="2024-01-01T00:00:00Z",
)
@fixture
def mock_streaming_chat_completion_tool_call() -> AsyncStream[OllamaChatResponse]:
ollama_tool_call = OllamaChatResponse(
message=OllamaMessage(
content="",
role="assistant",
tool_calls=[{"function": {"name": "hello_world", "arguments": {"arg1": "value1"}}}],
),
model="test",
)
stream = MagicMock(spec=AsyncStream)
stream.__aiter__.return_value = [ollama_tool_call]
return stream
@fixture
def mock_chat_completion_tool_call() -> OllamaChatResponse:
return OllamaChatResponse(
message=OllamaMessage(
content="",
role="assistant",
tool_calls=[{"function": {"name": "hello_world", "arguments": {"arg1": "value1"}}}],
),
model="test",
created_at="2024-01-01T00:00:00Z",
)
@tool(approval_mode="never_require")
def hello_world(arg1: str) -> str:
return "Hello World"
@tool(approval_mode="never_require")
def greet() -> str:
"""Say hello to the world. No-arg tool for integration tests to avoid argument parsing flakiness."""
return "Hello World"
def test_init(ollama_unit_test_env: dict[str, str]) -> None:
# Test successful initialization
ollama_chat_client = OllamaChatClient()
assert ollama_chat_client.client is not None
assert isinstance(ollama_chat_client.client, AsyncClient)
assert ollama_chat_client.model == ollama_unit_test_env["OLLAMA_MODEL"]
assert isinstance(ollama_chat_client, BaseChatClient)
def test_init_client(ollama_unit_test_env: dict[str, str]) -> None:
# Test successful initialization with provided client
test_client = MagicMock(spec=AsyncClient)
# Mock underlying HTTP client's base_url
test_client._client = MagicMock()
test_client._client.base_url = ollama_unit_test_env["OLLAMA_MODEL"]
ollama_chat_client = OllamaChatClient(client=test_client)
assert ollama_chat_client.client is test_client
assert ollama_chat_client.model == ollama_unit_test_env["OLLAMA_MODEL"]
assert isinstance(ollama_chat_client, BaseChatClient)
@pytest.mark.parametrize("exclude_list", [["OLLAMA_MODEL"]], indirect=True)
def test_with_invalid_settings(ollama_unit_test_env: dict[str, str]) -> None:
with pytest.raises(SettingNotFoundError, match="Required setting 'model'"):
OllamaChatClient(
host="http://localhost:12345",
model=None,
)
def test_serialize(ollama_unit_test_env: dict[str, str]) -> None:
settings = {
"host": ollama_unit_test_env["OLLAMA_HOST"],
"model": ollama_unit_test_env["OLLAMA_MODEL"],
}
ollama_chat_client = OllamaChatClient.from_dict(settings)
serialized = ollama_chat_client.to_dict()
assert isinstance(serialized, dict)
assert serialized["host"] == ollama_unit_test_env["OLLAMA_HOST"]
assert serialized["model"] == ollama_unit_test_env["OLLAMA_MODEL"]
def test_chat_middleware(ollama_unit_test_env: dict[str, str]) -> None:
@chat_middleware
async def sample_middleware(context, call_next):
await call_next()
ollama_chat_client = OllamaChatClient(middleware=[sample_middleware])
assert len(ollama_chat_client.middleware) == 1
assert ollama_chat_client.middleware[0] == sample_middleware
def test_additional_properties(ollama_unit_test_env: dict[str, str]) -> None:
additional_properties = {
"user_location": {
"country": "US",
"city": "Seattle",
}
}
ollama_chat_client = OllamaChatClient(
additional_properties=additional_properties,
)
assert ollama_chat_client.additional_properties == additional_properties
# region CMC
async def test_empty_messages() -> None:
ollama_chat_client = OllamaChatClient(
host="http://localhost:12345",
model="test-model",
)
with pytest.raises(ChatClientInvalidRequestException):
await ollama_chat_client.get_response(messages=[])
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
mock_chat_completion_response: AsyncStream[OllamaChatResponse],
) -> None:
mock_chat.return_value = mock_chat_completion_response
chat_history.append(Message(contents=["hello world"], role="system"))
chat_history.append(Message(contents=["hello world"], role="user"))
ollama_client = OllamaChatClient()
result = await ollama_client.get_response(messages=chat_history)
assert result.text == "test"
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_response_format_dict(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
) -> None:
mock_chat.return_value = OllamaChatResponse(
message=OllamaMessage(content='{"answer": "test"}', role="assistant"),
model="test",
eval_count=1,
prompt_eval_count=1,
created_at="2024-01-01T00:00:00Z",
)
chat_history.append(Message(contents=["hello world"], role="system"))
chat_history.append(Message(contents=["hello world"], role="user"))
ollama_client = OllamaChatClient()
result = await ollama_client.get_response(
messages=chat_history,
options={"response_format": {"type": "object", "properties": {"answer": {"type": "string"}}}},
)
assert result.value is not None
assert isinstance(result.value, dict)
assert result.value["answer"] == "test"
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_reasoning(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
mock_chat_completion_response_reasoning: AsyncStream[OllamaChatResponse],
) -> None:
mock_chat.return_value = mock_chat_completion_response_reasoning
chat_history.append(Message(contents=["hello world"], role="user"))
ollama_client = OllamaChatClient()
result = await ollama_client.get_response(messages=chat_history)
reasoning = "".join(c.text for c in result.messages.pop().contents if c.type == "text_reasoning")
assert reasoning == "test"
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_chat_failure(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
) -> None:
# Simulate a failure in the Ollama client
mock_chat.side_effect = Exception("Connection error")
chat_history.append(Message(contents=["hello world"], role="user"))
ollama_client = OllamaChatClient()
with pytest.raises(ChatClientException) as exc_info:
await ollama_client.get_response(messages=chat_history)
assert "Ollama chat request failed" in str(exc_info.value)
assert "Connection error" in str(exc_info.value)
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_streaming(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
mock_streaming_chat_completion_response: AsyncStream[OllamaChatResponse],
) -> None:
mock_chat.return_value = mock_streaming_chat_completion_response
chat_history.append(Message(contents=["hello world"], role="system"))
chat_history.append(Message(contents=["hello world"], role="user"))
ollama_client = OllamaChatClient()
result = ollama_client.get_response(messages=chat_history, stream=True)
async for chunk in result:
assert chunk.text == "test"
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_streaming_reasoning(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
mock_streaming_chat_completion_response_reasoning: AsyncStream[OllamaChatResponse],
) -> None:
mock_chat.return_value = mock_streaming_chat_completion_response_reasoning
chat_history.append(Message(contents=["hello world"], role="user"))
ollama_client = OllamaChatClient()
result = ollama_client.get_response(messages=chat_history, stream=True)
async for chunk in result:
reasoning = "".join(c.text for c in chunk.contents if c.type == "text_reasoning")
assert reasoning == "test"
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_streaming_chat_failure(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
) -> None:
# Simulate a failure in the Ollama client for streaming
mock_chat.side_effect = Exception("Streaming connection error")
chat_history.append(Message(contents=["hello world"], role="user"))
ollama_client = OllamaChatClient()
with pytest.raises(ChatClientException) as exc_info:
async for _ in ollama_client.get_response(messages=chat_history, stream=True):
pass
assert "Ollama streaming chat request failed" in str(exc_info.value)
assert "Streaming connection error" in str(exc_info.value)
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_streaming_with_tool_call(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
mock_streaming_chat_completion_response: AsyncStream[OllamaChatResponse],
mock_streaming_chat_completion_tool_call: AsyncStream[OllamaChatResponse],
) -> None:
mock_chat.side_effect = [
mock_streaming_chat_completion_tool_call,
mock_streaming_chat_completion_response,
]
chat_history.append(Message(contents=["hello world"], role="user"))
ollama_client = OllamaChatClient()
result = ollama_client.get_response(messages=chat_history, stream=True, options={"tools": [hello_world]})
chunks: list[ChatResponseUpdate] = []
async for chunk in result:
chunks.append(chunk)
# Check parsed Toolcalls
assert chunks[0].contents[0].type == "function_call"
tool_call = chunks[0].contents[0]
assert tool_call.name == "hello_world"
assert tool_call.arguments == {"arg1": "value1"}
assert chunks[1].contents[0].type == "function_result"
tool_result = chunks[1].contents[0]
assert tool_result.result == "Hello World"
assert chunks[2].contents[0].type == "text"
text_result = chunks[2].contents[0]
assert text_result.text == "test"
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_with_dict_tool_passthrough(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
mock_chat_completion_response: OllamaChatResponse,
) -> None:
"""Test that dict-based tools are passed through to Ollama."""
mock_chat.return_value = mock_chat_completion_response
chat_history.append(Message(contents=["hello world"], role="user"))
ollama_client = OllamaChatClient()
await ollama_client.get_response(
messages=chat_history,
options={
"tools": [{"type": "function", "function": {"name": "custom_tool", "parameters": {}}}],
},
)
# Verify the tool was passed through to the Ollama client
mock_chat.assert_called_once()
call_kwargs = mock_chat.call_args.kwargs
assert "tools" in call_kwargs
assert call_kwargs["tools"] == [{"type": "function", "function": {"name": "custom_tool", "parameters": {}}}]
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_with_data_content_type(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
mock_chat_completion_response: OllamaChatResponse,
) -> None:
mock_chat.return_value = mock_chat_completion_response
chat_history.append(
Message(
contents=[Content.from_uri(uri="data:image/png;base64,xyz", media_type="image/png")],
role="user",
)
)
ollama_client = OllamaChatClient()
result = await ollama_client.get_response(messages=chat_history)
assert result.text == "test"
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_with_invalid_data_content_media_type(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
mock_streaming_chat_completion_response: AsyncStream[OllamaChatResponse],
) -> None:
with pytest.raises(ChatClientInvalidRequestException):
mock_chat.return_value = mock_streaming_chat_completion_response
# Remote Uris are not supported by Ollama client
chat_history.append(
Message(
contents=[Content.from_uri(uri="data:audio/mp3;base64,xyz", media_type="audio/mp3")],
role="user",
)
)
ollama_client = OllamaChatClient()
ollama_client.client.chat = AsyncMock(return_value=mock_streaming_chat_completion_response)
await ollama_client.get_response(messages=chat_history)
@patch.object(AsyncClient, "chat", new_callable=AsyncMock)
async def test_cmc_with_invalid_content_type(
mock_chat: AsyncMock,
ollama_unit_test_env: dict[str, str],
chat_history: list[Message],
mock_chat_completion_response: AsyncStream[OllamaChatResponse],
) -> None:
with pytest.raises(ChatClientInvalidRequestException):
mock_chat.return_value = mock_chat_completion_response
# Remote Uris are not supported by Ollama client
chat_history.append(
Message(
contents=[Content.from_uri(uri="http://example.com/image.png", media_type="image/png")],
role="user",
)
)
ollama_client = OllamaChatClient()
await ollama_client.get_response(messages=chat_history)
@pytest.mark.flaky
@pytest.mark.integration
@skip_if_azure_integration_tests_disabled
async def test_cmc_integration_with_tool_call(
chat_history: list[Message],
) -> None:
chat_history.append(Message(contents=["Call the greet function and repeat what it says"], role="user"))
ollama_client = OllamaChatClient()
result = await ollama_client.get_response(messages=chat_history, options={"tools": [greet]})
assert "hello" in result.text.lower() and "world" in result.text.lower()
assert result.messages[-2].contents[0].type == "function_result"
tool_result = result.messages[-2].contents[0]
assert tool_result.result == "Hello World"
@pytest.mark.flaky
@pytest.mark.integration
@skip_if_azure_integration_tests_disabled
async def test_cmc_integration_with_chat_completion(
chat_history: list[Message],
) -> None:
chat_history.append(Message(contents=["Say Hello World"], role="user"))
ollama_client = OllamaChatClient()
result = await ollama_client.get_response(messages=chat_history)
assert "hello" in result.text.lower()
@pytest.mark.flaky
@pytest.mark.integration
@skip_if_azure_integration_tests_disabled
async def test_cmc_streaming_integration_with_tool_call(
chat_history: list[Message],
) -> None:
chat_history.append(Message(contents=["Call the greet function and repeat what it says"], role="user"))
ollama_client = OllamaChatClient()
result: AsyncIterable[ChatResponseUpdate] = ollama_client.get_response(
messages=chat_history, stream=True, options={"tools": [greet]}
)
chunks: list[ChatResponseUpdate] = []
async for chunk in result:
chunks.append(chunk)
for c in chunks:
if len(c.contents) > 0:
if c.contents[0].type == "function_result":
tool_result = c.contents[0]
assert tool_result.result == "Hello World"
if c.contents[0].type == "function_call":
tool_call = c.contents[0]
assert tool_call.name == "greet"
@pytest.mark.flaky
@pytest.mark.integration
@skip_if_azure_integration_tests_disabled
async def test_cmc_streaming_integration_with_chat_completion(
chat_history: list[Message],
) -> None:
chat_history.append(Message(contents=["Say Hello World"], role="user"))
ollama_client = OllamaChatClient()
result: AsyncIterable[ChatResponseUpdate] = ollama_client.get_response(messages=chat_history, stream=True)
full_text = ""
async for chunk in result:
full_text += chunk.text
assert "hello" in full_text.lower() and "world" in full_text.lower()