Files
agent-framework/python/samples/getting_started/multimodal_input
T
Eduard van Valkenburg 9355329dfd Python: [BREAKING] updated structure and samples (#875)
* updated structure and samples

* updated names and removed cross tests

* updated projects etc

* updated tests

* updated test

* test fixes

* removed devui for now

* updated all-tests task

* removed old style configs

* remove coverage from tests

* updated to unit tests with all-tests

* updated foundry everywhere

* fix azure ai tests

* fix merge tests

* fix mypy
9355329dfd ยท 2025-09-25 07:02:53 +00:00
History
..

Multimodal Input Examples

This folder contains examples demonstrating how to send multimodal content (images, audio, PDF files) to AI agents using the Agent Framework.

Examples

OpenAI Chat Client

  • File: openai_chat_multimodal.py
  • Description: Shows how to send images, audio, and PDF files to OpenAI's Chat Completions API
  • Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents

Azure Chat Client

  • File: azure_chat_multimodal.py
  • Description: Shows how to send multimodal content to Azure OpenAI service
  • Supported formats: PNG/JPEG images, WAV/MP3 audio, PDF documents

Running the Examples

  1. Set your API keys:

    export OPENAI_API_KEY="your-openai-key"
    export AZURE_OPENAI_API_KEY="your-azure-key"
    export AZURE_OPENAI_ENDPOINT="your-azure-endpoint"
    
  2. Run an example:

    python openai_chat_client_multimodal.py
    python azure_chat_client_multimodal.py
    

Using Your Own Files

The examples include small embedded test files for demonstration. To use your own files:

import base64

# Load and encode your file
with open("path/to/your/image.jpg", "rb") as f:
    image_data = f.read()
    image_base64 = base64.b64encode(image_data).decode('utf-8')
    image_uri = f"data:image/jpeg;base64,{image_base64}"

# Use in DataContent
DataContent(
    uri=image_uri,
    media_type="image/jpeg"
)

Method 2: Raw bytes

# Load raw bytes
with open("path/to/your/image.jpg", "rb") as f:
    image_bytes = f.read()

# Use in DataContent
DataContent(
    data=image_bytes,
    media_type="image/jpeg"
)

Supported File Types

Type Formats Notes
Images PNG, JPEG, GIF, WebP Most common image formats
Audio WAV, MP3 For transcription and analysis
Documents PDF Text extraction and analysis

API Differences

  • Chat Completions API: Supports images, audio, and PDF files
  • Assistants API: Only supports text and images (no audio/PDF)
  • Responses API: Similar to Chat Completions

Choose the appropriate client based on your multimodal needs.