Files
agent-framework/python/packages/openai
T
Tao Chen 1b6f7d80fd Python: Record actual served model from Azure OpenAI (#5910)
* Record actual served model as response model for Azure OpenAI

* Formatting

* Fix tests

* Fix pipeline error

* Comments

* Address review: surface served model via ChatResponse.model

Apply blocking review feedback from PR #5910:

- Use ChatResponse.model / ChatResponseUpdate.model as the source of truth
  for the Azure x-ms-served-model header value, instead of stashing it in
  additional_properties and overriding it again in observability.
  Observability already reads response.model; the chat client now overwrites
  it post-parse when the served-model header is present. Empirically the
  Azure Responses API returns the deployment alias in body.model and the
  actual snapshot (e.g. gpt-5-nano-2025-08-07) in this header.

- Move the AZURE_OPENAI_SERVED_MODEL_HEADER constant out of observability.py
  and into RawOpenAIChatClient (as the SERVED_MODEL_HEADER ClassVar). The
  header is Azure-OpenAI-Responses-API-specific so observability does not
  need to know about it.

- Revert the streaming text_format path to client.responses.stream(...) and
  drop the _pydantic_model_to_text_format_param helper. That helper imported
  from openai.lib._parsing._responses (a private SDK path) and the swap to
  responses.create(stream=True) dropped client-side output_parsed for
  structured-output streaming. The streaming-with-text_format path is the
  only one that does not surface the served-model header - documented inline.

- Wrap the raw streaming responses in async with so the underlying socket
  closes deterministically (continuation_token retrieve + create paths).

- Fix the empty-string / whitespace-only header at the source by stripping
  in _extract_served_model and returning None when nothing remains.

- Revert unrelated formatting-only churn in _skills.py and test_mcp.py.

- Update unit tests to assert against chat_response.model / update.model
  and add an aggregated streaming assertion plus a pin that the
  streaming-with-text_format path does not get the header.

Verified end-to-end against Azure OpenAI Responses API: deployment alias
gpt-5-nano now reports gpt-5-nano-2025-08-07 as ChatResponse.model in both
the non-streaming and streaming paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: preserve streaming structured output finalization

Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/f62076ef-558d-49e8-8fe2-f38d527c9639

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* refactor: name streaming response finalizer

Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/f62076ef-558d-49e8-8fe2-f38d527c9639

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* fix: capture streaming response format after prepare

Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/f62076ef-558d-49e8-8fe2-f38d527c9639

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* refactor: clarify streaming response format capture

Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/f62076ef-558d-49e8-8fe2-f38d527c9639

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* test: use public API for streaming structured output

Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/f62076ef-558d-49e8-8fe2-f38d527c9639

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

* Inline the served-model header override at its two call sites

The `_apply_served_model_header` helper was a 1-line wrapper around
`_extract_served_model`. Inlining the `if served_model is not None: ...`
matches the pattern already used in the streaming paths and folds the
explanatory docstring onto `_extract_served_model` (which is now the
single place that knows about the header).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
1b6f7d80fd ยท 2026-05-19 06:38:53 +00:00
History
..

agent-framework-openai

OpenAI integration for Microsoft Agent Framework.

This package provides:

  • OpenAIChatClient for the OpenAI Responses API
  • OpenAIChatCompletionClient for the Chat Completions API
  • OpenAIEmbeddingClient for embeddings

Installation

pip install agent-framework-openai

Which chat client should I use?

Use OpenAIChatClient for new work unless you specifically need the Chat Completions API.

  • OpenAIChatClient uses the Responses API and is the preferred general-purpose chat client.
  • OpenAIChatCompletionClient uses the Chat Completions API and is mainly for compatibility with existing Chat Completions-based integrations.

The previous deprecated Responses alias has been removed. Use OpenAIChatClient directly.

Environment variables

OpenAI

These variables are used when the client is configured for OpenAI:

Variable Purpose
OPENAI_API_KEY OpenAI API key
OPENAI_ORG_ID OpenAI organization ID
OPENAI_BASE_URL Custom OpenAI-compatible base URL
OPENAI_MODEL Generic fallback model
OPENAI_CHAT_MODEL Preferred model for OpenAIChatClient
OPENAI_CHAT_COMPLETION_MODEL Preferred model for OpenAIChatCompletionClient
OPENAI_EMBEDDING_MODEL Preferred model for OpenAIEmbeddingClient

Model lookup order:

  • OpenAIChatClient: OPENAI_CHAT_MODEL -> OPENAI_MODEL
  • OpenAIChatCompletionClient: OPENAI_CHAT_COMPLETION_MODEL -> OPENAI_MODEL
  • OpenAIEmbeddingClient: OPENAI_EMBEDDING_MODEL -> OPENAI_MODEL

These model variables are only consulted when you do not pass model= directly. In other words, OpenAIChatClient(model="...") ignores OPENAI_CHAT_MODEL, and OpenAIChatCompletionClient(model="...") ignores OPENAI_CHAT_COMPLETION_MODEL.

Azure OpenAI

These variables are used when the client is configured for Azure OpenAI:

Variable Purpose
AZURE_OPENAI_ENDPOINT Azure OpenAI resource endpoint
AZURE_OPENAI_BASE_URL Full Azure OpenAI base URL (.../openai/v1)
AZURE_OPENAI_API_KEY Azure OpenAI API key
AZURE_OPENAI_API_VERSION Azure OpenAI API version
AZURE_OPENAI_MODEL Generic fallback deployment
AZURE_OPENAI_CHAT_MODEL Preferred deployment for OpenAIChatClient
AZURE_OPENAI_CHAT_COMPLETION_MODEL Preferred deployment for OpenAIChatCompletionClient
AZURE_OPENAI_EMBEDDING_MODEL Preferred deployment for OpenAIEmbeddingClient

Deployment lookup order:

  • OpenAIChatClient: AZURE_OPENAI_CHAT_MODEL -> AZURE_OPENAI_MODEL
  • OpenAIChatCompletionClient: AZURE_OPENAI_CHAT_COMPLETION_MODEL -> AZURE_OPENAI_MODEL
  • OpenAIEmbeddingClient: AZURE_OPENAI_EMBEDDING_MODEL -> AZURE_OPENAI_MODEL

For Azure routing, the same rule applies: the client-specific deployment variable is checked first, then the generic AZURE_OPENAI_MODEL fallback. Passing model= overrides both environment variables.

When both OpenAI and Azure environment variables are present, the generic clients prefer OpenAI when OPENAI_API_KEY is configured. To use Azure explicitly, pass azure_endpoint or credential.

OpenAI example

from agent_framework.openai import OpenAIChatClient

client = OpenAIChatClient(model="gpt-4.1")

Azure OpenAI example

from azure.identity.aio import AzureCliCredential

from agent_framework.openai import OpenAIChatClient

client = OpenAIChatClient(
    model="my-responses-deployment",
    azure_endpoint="https://my-resource.openai.azure.com",
    credential=AzureCliCredential(),
)

ChatClient vs ChatCompletionClient

Use OpenAIChatClient when you want the Responses API as your default chat surface.

Use OpenAIChatCompletionClient when you specifically need the Chat Completions API:

from agent_framework.openai import OpenAIChatCompletionClient

client = OpenAIChatCompletionClient(model="gpt-4o-mini")