mirror of https://github.com/microsoft/agent-framework.git synced 2026-06-16 21:04:09 +08:00

Files

T

Eduard van Valkenburg 6acab3d1d6 Python: [BREAKING] Standardize model selection on model (#4999 )

* Refactor Anthropic model option and provider clients

Rename the Anthropic client model option from model_id to model, add provider-specific Anthropic wrappers for Foundry, Bedrock, and Vertex, and expose them through the Anthropic, Foundry, Amazon, and Google namespaces. Update core option handling, docs, samples, and tests accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Anthropic skills sample typing

Cast the Anthropic beta client to Any in the skills sample so the pre-commit sample pyright check no longer fails on beta skills and files endpoints that are not exposed by the current SDK stubs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* undo sample mypy

* Retry CI after transient external failures

Retrigger PR validation after an unrelated Copilot review workflow SAML failure and a transient external tau2 git fetch failure in the Windows Python test setup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback on model option merging

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address Anthropic compatibility review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* moved all to `model`

* fixes for azure ai search

* Python: standardize remaining sample env var names

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix foundry-local pyright compatibility

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updated env vars in cicd

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

6acab3d1d6 · 2026-04-01 19:00:18 +00:00

History

azure_chat_multimodal.py

Python: [BREAKING] Standardize model selection on model (#4999 )

2026-04-01 19:00:18 +00:00

azure_responses_multimodal.py

Python: [BREAKING] Standardize model selection on model (#4999 )

2026-04-01 19:00:18 +00:00

openai_chat_multimodal.py

Python: updated azure ai inference sample (#5028 )

2026-04-01 13:35:56 +00:00

README.md

Python: [BREAKING] Standardize model selection on model (#4999 )

2026-04-01 19:00:18 +00:00

README.md

Multimodal Input Examples

This folder contains examples demonstrating how to send multimodal content (images, audio, PDF files) to AI agents using the Agent Framework.

Examples

OpenAI Chat Client

File: openai_chat_multimodal.py
Description: Shows how to send images, audio, and PDF files to OpenAI's Chat Completions API
Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents

Azure OpenAI Chat Client

File: azure_chat_multimodal.py
Description: Shows how to send images to Azure OpenAI Chat Completions API
Supported formats: PNG/JPEG images (PDF files are NOT supported by Chat Completions API)

Azure OpenAI Responses Client

File: azure_responses_multimodal.py
Description: Shows how to send images and PDF files to Azure OpenAI Responses API
Supported formats: PNG/JPEG images, PDF documents (full multimodal support)

Environment Variables

Set the following environment variables before running the examples:

For OpenAI:

OPENAI_API_KEY: Your OpenAI API key

For Azure OpenAI:

AZURE_OPENAI_ENDPOINT: Your Azure OpenAI endpoint
AZURE_OPENAI_MODEL: The name of your Azure OpenAI chat model deployment
AZURE_OPENAI_MODEL: The name of your Azure OpenAI responses model deployment

Optionally for Azure OpenAI:

AZURE_OPENAI_API_VERSION: The API version to use (default is 2024-10-21)
AZURE_OPENAI_API_KEY: Your Azure OpenAI API key (if not using AzureCliCredential)

Note: You can also provide configuration directly in code instead of using environment variables:

# Example: Pass the Foundry project endpoint directly
client = FoundryChatClient(
    credential=AzureCliCredential(),
    project_endpoint="https://your-project.services.ai.azure.com",
    model="your-deployment-name",
)

Authentication

The Azure example uses AzureCliCredential for authentication. Run az login in your terminal before running the example, or replace AzureCliCredential with your preferred authentication method (e.g., provide api_key parameter).

Running the Examples

# Run OpenAI example
python openai_chat_multimodal.py

# Run Azure Chat example (requires az login or API key)
python azure_chat_multimodal.py

# Run Azure Responses example (requires az login or API key)
python azure_responses_multimodal.py

Using Your Own Files

The examples include small embedded test files for demonstration. To use your own files:

Method 1: Data URIs (recommended)

import base64

# Load and encode your file
with open("path/to/your/image.jpg", "rb") as f:
    image_data = f.read()
    image_base64 = base64.b64encode(image_data).decode('utf-8')
    image_uri = f"data:image/jpeg;base64,{image_base64}"

# Use in DataContent
Content.from_uri(
    uri=image_uri,
    media_type="image/jpeg"
)

Method 2: Raw bytes

# Load raw bytes
with open("path/to/your/image.jpg", "rb") as f:
    image_bytes = f.read()

# Use in DataContent
Content.from_data(
    data=image_bytes,
    media_type="image/jpeg"
)

Supported File Types

Type	Formats	Notes
Images	PNG, JPEG, GIF, WebP	Most common image formats
Audio	WAV, MP3	For transcription and analysis
Documents	PDF	Text extraction and analysis

API Differences

OpenAI Chat Completions API: Supports images, audio, and PDF files
Azure OpenAI Chat Completions API: Supports images only (no PDF/audio file types)
Azure OpenAI Responses API: Supports images and PDF files (full multimodal support)

Choose the appropriate client based on your multimodal needs and available APIs.