Files
agent-framework/python/samples/getting_started/multimodal_input
T
Eduard van Valkenburg 838a7fd61d Python: [BREAKING] Types API Review improvements (#3647)
* Replace Role and FinishReason classes with NewType + Literal

- Remove EnumLike metaclass from _types.py
- Replace Role class with NewType('Role', str) + RoleLiteral
- Replace FinishReason class with NewType('FinishReason', str) + FinishReasonLiteral
- Update all usages across codebase to use string literals
- Remove .value access patterns (direct string comparison now works)
- Add backward compatibility for legacy dict serialization format
- Update tests to reflect new string-based types

Addresses #3591, #3615

* Simplify ChatResponse and AgentResponse type hints (#3592)

- Remove overloads from ChatResponse.__init__
- Remove text parameter from ChatResponse.__init__
- Remove | dict[str, Any] from finish_reason and usage_details params
- Remove **kwargs from AgentResponse.__init__
- Both now accept ChatMessage | Sequence[ChatMessage] | None for messages
- Update docstrings and examples to reflect changes
- Fix tests that were using removed kwargs
- Fix Role type hint usage in ag-ui utils

* Remove text parameter from ChatResponseUpdate and AgentResponseUpdate (#3597)

- Remove text parameter from ChatResponseUpdate.__init__
- Remove text parameter from AgentResponseUpdate.__init__
- Remove **kwargs from both update classes
- Simplify contents parameter type to Sequence[Content] | None
- Update all usages to use contents=[Content.from_text(...)] pattern
- Fix imports in test files
- Update docstrings and examples

* Rename from_chat_response_updates to from_updates (#3593)

- ChatResponse.from_chat_response_updates โ†’ ChatResponse.from_updates
- ChatResponse.from_chat_response_generator โ†’ ChatResponse.from_update_generator
- AgentResponse.from_agent_run_response_updates โ†’ AgentResponse.from_updates

* Remove try_parse_value method from ChatResponse and AgentResponse (#3595)

- Remove try_parse_value method from ChatResponse
- Remove try_parse_value method from AgentResponse
- Remove try_parse_value calls from from_updates and from_update_generator methods
- Update samples to use try/except with response.value instead
- Update tests to use response.value pattern
- Users should now use response.value with try/except for safe parsing

* Add agent_id to AgentResponse and clarify author_name documentation (#3596)

- Add agent_id parameter to AgentResponse class
- Document that author_name is on ChatMessage objects, not responses
- Update ChatResponse docstring with author_name note
- Update AgentResponse docstring with author_name note

* Simplify ChatMessage.__init__ signature (#3618)

- Make contents a positional argument accepting Sequence[Content | str]
- Auto-convert strings in contents to TextContent
- Remove overloads, keep text kwarg for backward compatibility with serialization
- Update _parse_content_list to handle string items
- Update all usages across codebase to use new format: ChatMessage("role", ["text"])

* Allow Content as input on run and get_response

- Update prepare_messages and normalize_messages to accept Content
- Update type signatures in _agents.py and _clients.py
- Add tests for Content input handling

* Fix ChatMessage usage across packages and samples

Update all remaining ChatMessage(role=..., text=...) to use new
ChatMessage('role', ['text']) signature.

* Fix Role string usage and response format parsing

- Fix redis provider: remove .value access on string literals
- Fix durabletask ensure_response_format: set _response_format before accessing .value

* Fix ollama .value and ai_model_id issues, handle None in content list

- Fix ollama _chat_client: remove .value on string literals
- Fix ollama _chat_client: rename ai_model_id to model_id
- Fix _parse_content_list: skip None values gracefully

* Fix A2AAgent type signature to include Content

* Fix Role/FinishReason NewType dict annotations and improve test coverage to 95%

* Fix mypy errors for Role/FinishReason NewType usage

* Fix Role.TOOL and Role.ASSISTANT usage in _orchestrator_helpers.py

* Fix Role NewType usage in durabletask _models.py
838a7fd61d ยท 2026-02-04 10:13:23 +00:00
History
..

Multimodal Input Examples

This folder contains examples demonstrating how to send multimodal content (images, audio, PDF files) to AI agents using the Agent Framework.

Examples

OpenAI Chat Client

  • File: openai_chat_multimodal.py
  • Description: Shows how to send images, audio, and PDF files to OpenAI's Chat Completions API
  • Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents

Azure OpenAI Chat Client

  • File: azure_chat_multimodal.py
  • Description: Shows how to send images to Azure OpenAI Chat Completions API
  • Supported formats: PNG/JPEG images (PDF files are NOT supported by Chat Completions API)

Azure OpenAI Responses Client

  • File: azure_responses_multimodal.py
  • Description: Shows how to send images and PDF files to Azure OpenAI Responses API
  • Supported formats: PNG/JPEG images, PDF documents (full multimodal support)

Environment Variables

Set the following environment variables before running the examples:

For OpenAI:

  • OPENAI_API_KEY: Your OpenAI API key

For Azure OpenAI:

  • AZURE_OPENAI_ENDPOINT: Your Azure OpenAI endpoint
  • AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: The name of your Azure OpenAI chat model deployment
  • AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME: The name of your Azure OpenAI responses model deployment

Optionally for Azure OpenAI:

  • AZURE_OPENAI_API_VERSION: The API version to use (default is 2024-10-21)
  • AZURE_OPENAI_API_KEY: Your Azure OpenAI API key (if not using AzureCliCredential)

Note: You can also provide configuration directly in code instead of using environment variables:

# Example: Pass deployment_name directly
client = AzureOpenAIChatClient(
    credential=AzureCliCredential(),
    deployment_name="your-deployment-name",
    endpoint="https://your-resource.openai.azure.com"
)

Authentication

The Azure example uses AzureCliCredential for authentication. Run az login in your terminal before running the example, or replace AzureCliCredential with your preferred authentication method (e.g., provide api_key parameter).

Running the Examples

# Run OpenAI example
python openai_chat_multimodal.py

# Run Azure Chat example (requires az login or API key)
python azure_chat_multimodal.py

# Run Azure Responses example (requires az login or API key)
python azure_responses_multimodal.py

Using Your Own Files

The examples include small embedded test files for demonstration. To use your own files:

import base64

# Load and encode your file
with open("path/to/your/image.jpg", "rb") as f:
    image_data = f.read()
    image_base64 = base64.b64encode(image_data).decode('utf-8')
    image_uri = f"data:image/jpeg;base64,{image_base64}"

# Use in DataContent
Content.from_uri(
    uri=image_uri,
    media_type="image/jpeg"
)

Method 2: Raw bytes

# Load raw bytes
with open("path/to/your/image.jpg", "rb") as f:
    image_bytes = f.read()

# Use in DataContent
Content.from_data(
    data=image_bytes,
    media_type="image/jpeg"
)

Supported File Types

Type Formats Notes
Images PNG, JPEG, GIF, WebP Most common image formats
Audio WAV, MP3 For transcription and analysis
Documents PDF Text extraction and analysis

API Differences

  • OpenAI Chat Completions API: Supports images, audio, and PDF files
  • Azure OpenAI Chat Completions API: Supports images only (no PDF/audio file types)
  • Azure OpenAI Responses API: Supports images and PDF files (full multimodal support)

Choose the appropriate client based on your multimodal needs and available APIs.