Files
agent-framework/python/samples/getting_started/multimodal_input
T
4cd76bf2b0 ยท 2025-09-29 08:43:00 +00:00
History
..
2025-09-29 08:43:00 +00:00
2025-09-29 08:43:00 +00:00

Multimodal Input Examples

This folder contains examples demonstrating how to send multimodal content (images, audio, PDF files) to AI agents using the Agent Framework.

Examples

OpenAI Chat Client

  • File: openai_chat_multimodal.py
  • Description: Shows how to send images, audio, and PDF files to OpenAI's Chat Completions API
  • Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents

Azure Chat Client

  • File: azure_chat_multimodal.py
  • Description: Shows how to send multimodal content to Azure OpenAI service
  • Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents

Running the Examples

  1. Set your API keys:

    export OPENAI_API_KEY="your-openai-key"
    export AZURE_OPENAI_API_KEY="your-azure-key"
    export AZURE_OPENAI_ENDPOINT="your-azure-endpoint"
    
  2. Run an example:

    python openai_chat_client_multimodal.py
    python azure_chat_client_multimodal.py
    

Using Your Own Files

The examples include small embedded test files for demonstration. To use your own files:

import base64

# Load and encode your file
with open("path/to/your/image.jpg", "rb") as f:
    image_data = f.read()
    image_base64 = base64.b64encode(image_data).decode('utf-8')
    image_uri = f"data:image/jpeg;base64,{image_base64}"

# Use in DataContent
DataContent(
    uri=image_uri,
    media_type="image/jpeg"
)

Method 2: Raw bytes

# Load raw bytes
with open("path/to/your/image.jpg", "rb") as f:
    image_bytes = f.read()

# Use in DataContent
DataContent(
    data=image_bytes,
    media_type="image/jpeg"
)

Supported File Types

Type Formats Notes
Images PNG, JPEG, GIF, WebP Most common image formats
Audio WAV, MP3 For transcription and analysis
Documents PDF Text extraction and analysis

API Differences

  • Chat Completions API: Supports images, audio, and PDF files
  • Assistants API: Only supports text and images (no audio/PDF)
  • Responses API: Similar to Chat Completions

Choose the appropriate client based on your multimodal needs.