mirror of https://github.com/microsoft/agent-framework.git synced 2026-06-16 21:04:09 +08:00

Files

T

Eduard van Valkenburg 9355329dfd Python: [BREAKING] updated structure and samples (#875 )

* updated structure and samples

* updated names and removed cross tests

* updated projects etc

* updated tests

* updated test

* test fixes

* removed devui for now

* updated all-tests task

* removed old style configs

* remove coverage from tests

* updated to unit tests with all-tests

* updated foundry everywhere

* fix azure ai tests

* fix merge tests

* fix mypy

9355329dfd · 2025-09-25 07:02:53 +00:00

History

azure_chat_multimodal.py

Python: [BREAKING] updated structure and samples (#875 )

2025-09-25 07:02:53 +00:00

openai_chat_multimodal.py

Python: Fix Multimodal input bug (#799 )

2025-09-17 21:44:42 +00:00

README.md

Python: Fix Multimodal input bug (#799 )

2025-09-17 21:44:42 +00:00

README.md

Multimodal Input Examples

This folder contains examples demonstrating how to send multimodal content (images, audio, PDF files) to AI agents using the Agent Framework.

Examples

OpenAI Chat Client

File: openai_chat_multimodal.py
Description: Shows how to send images, audio, and PDF files to OpenAI's Chat Completions API
Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents

Azure Chat Client

File: azure_chat_multimodal.py
Description: Shows how to send multimodal content to Azure OpenAI service
Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents

Running the Examples

Set your API keys:

export OPENAI_API_KEY="your-openai-key"
export AZURE_OPENAI_API_KEY="your-azure-key"
export AZURE_OPENAI_ENDPOINT="your-azure-endpoint"

Run an example:

python openai_chat_client_multimodal.py
python azure_chat_client_multimodal.py

Using Your Own Files

The examples include small embedded test files for demonstration. To use your own files:

Method 1: Data URIs (recommended)

import base64

# Load and encode your file
with open("path/to/your/image.jpg", "rb") as f:
    image_data = f.read()
    image_base64 = base64.b64encode(image_data).decode('utf-8')
    image_uri = f"data:image/jpeg;base64,{image_base64}"

# Use in DataContent
DataContent(
    uri=image_uri,
    media_type="image/jpeg"
)

Method 2: Raw bytes

# Load raw bytes
with open("path/to/your/image.jpg", "rb") as f:
    image_bytes = f.read()

# Use in DataContent
DataContent(
    data=image_bytes,
    media_type="image/jpeg"
)

Supported File Types

Type	Formats	Notes
Images	PNG, JPEG, GIF, WebP	Most common image formats
Audio	WAV, MP3	For transcription and analysis
Documents	PDF	Text extraction and analysis

API Differences

Chat Completions API: Supports images, audio, and PDF files
Assistants API: Only supports text and images (no audio/PDF)
Responses API: Similar to Chat Completions

Choose the appropriate client based on your multimodal needs.