mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
cd1bdc1483
* Initial plan * Update multimodal input sample to document required environment variables Co-authored-by: dmytrostruk <13853051+dmytrostruk@users.noreply.github.com> * Add examples showing how to pass deployment_name as parameter Co-authored-by: dmytrostruk <13853051+dmytrostruk@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: dmytrostruk <13853051+dmytrostruk@users.noreply.github.com>
3.2 KiB
3.2 KiB
Multimodal Input Examples
This folder contains examples demonstrating how to send multimodal content (images, audio, PDF files) to AI agents using the Agent Framework.
Examples
OpenAI Chat Client
- File:
openai_chat_multimodal.py - Description: Shows how to send images, audio, and PDF files to OpenAI's Chat Completions API
- Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents
Azure Chat Client
- File:
azure_chat_multimodal.py - Description: Shows how to send multimodal content to Azure OpenAI service
- Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents
Environment Variables
Set the following environment variables before running the examples:
For OpenAI:
OPENAI_API_KEY: Your OpenAI API key
For Azure OpenAI:
AZURE_OPENAI_ENDPOINT: Your Azure OpenAI endpointAZURE_OPENAI_CHAT_DEPLOYMENT_NAME: The name of your Azure OpenAI chat model deployment
Optionally for Azure OpenAI:
AZURE_OPENAI_API_VERSION: The API version to use (default is2024-10-21)AZURE_OPENAI_API_KEY: Your Azure OpenAI API key (if not usingAzureCliCredential)
Note: You can also provide configuration directly in code instead of using environment variables:
# Example: Pass deployment_name directly
client = AzureOpenAIChatClient(
credential=AzureCliCredential(),
deployment_name="your-deployment-name",
endpoint="https://your-resource.openai.azure.com"
)
Authentication
The Azure example uses AzureCliCredential for authentication. Run az login in your terminal before running the example, or replace AzureCliCredential with your preferred authentication method (e.g., provide api_key parameter).
Running the Examples
# Run OpenAI example
python openai_chat_multimodal.py
# Run Azure example (requires az login or API key)
python azure_chat_multimodal.py
Using Your Own Files
The examples include small embedded test files for demonstration. To use your own files:
Method 1: Data URIs (recommended)
import base64
# Load and encode your file
with open("path/to/your/image.jpg", "rb") as f:
image_data = f.read()
image_base64 = base64.b64encode(image_data).decode('utf-8')
image_uri = f"data:image/jpeg;base64,{image_base64}"
# Use in DataContent
DataContent(
uri=image_uri,
media_type="image/jpeg"
)
Method 2: Raw bytes
# Load raw bytes
with open("path/to/your/image.jpg", "rb") as f:
image_bytes = f.read()
# Use in DataContent
DataContent(
data=image_bytes,
media_type="image/jpeg"
)
Supported File Types
| Type | Formats | Notes |
|---|---|---|
| Images | PNG, JPEG, GIF, WebP | Most common image formats |
| Audio | WAV, MP3 | For transcription and analysis |
| Documents | Text extraction and analysis |
API Differences
- Chat Completions API: Supports images, audio, and PDF files
- Assistants API: Only supports text and images (no audio/PDF)
- Responses API: Similar to Chat Completions
Choose the appropriate client based on your multimodal needs.