mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
f3264966ff
* fix multimodal bug python * update file names * precommit fixes * Update python/samples/getting_started/multimodal_input/README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * udpate readme * add copyright line, remove audio example function --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2.3 KiB
2.3 KiB
Multimodal Input Examples
This folder contains examples demonstrating how to send multimodal content (images, audio, PDF files) to AI agents using the Agent Framework.
Examples
OpenAI Chat Client
- File:
openai_chat_multimodal.py - Description: Shows how to send images, audio, and PDF files to OpenAI's Chat Completions API
- Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents
Azure Chat Client
- File:
azure_chat_multimodal.py - Description: Shows how to send multimodal content to Azure OpenAI service
- Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents
Running the Examples
-
Set your API keys:
export OPENAI_API_KEY="your-openai-key" export AZURE_OPENAI_API_KEY="your-azure-key" export AZURE_OPENAI_ENDPOINT="your-azure-endpoint" -
Run an example:
python openai_chat_client_multimodal.py python azure_chat_client_multimodal.py
Using Your Own Files
The examples include small embedded test files for demonstration. To use your own files:
Method 1: Data URIs (recommended)
import base64
# Load and encode your file
with open("path/to/your/image.jpg", "rb") as f:
image_data = f.read()
image_base64 = base64.b64encode(image_data).decode('utf-8')
image_uri = f"data:image/jpeg;base64,{image_base64}"
# Use in DataContent
DataContent(
uri=image_uri,
media_type="image/jpeg"
)
Method 2: Raw bytes
# Load raw bytes
with open("path/to/your/image.jpg", "rb") as f:
image_bytes = f.read()
# Use in DataContent
DataContent(
data=image_bytes,
media_type="image/jpeg"
)
Supported File Types
| Type | Formats | Notes |
|---|---|---|
| Images | PNG, JPEG, GIF, WebP | Most common image formats |
| Audio | WAV, MP3 | For transcription and analysis |
| Documents | Text extraction and analysis |
API Differences
- Chat Completions API: Supports images, audio, and PDF files
- Assistants API: Only supports text and images (no audio/PDF)
- Responses API: Similar to Chat Completions
Choose the appropriate client based on your multimodal needs.