mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
6acab3d1d6
* Refactor Anthropic model option and provider clients Rename the Anthropic client model option from model_id to model, add provider-specific Anthropic wrappers for Foundry, Bedrock, and Vertex, and expose them through the Anthropic, Foundry, Amazon, and Google namespaces. Update core option handling, docs, samples, and tests accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Anthropic skills sample typing Cast the Anthropic beta client to Any in the skills sample so the pre-commit sample pyright check no longer fails on beta skills and files endpoints that are not exposed by the current SDK stubs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * undo sample mypy * Retry CI after transient external failures Retrigger PR validation after an unrelated Copilot review workflow SAML failure and a transient external tau2 git fetch failure in the Windows Python test setup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review feedback on model option merging Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address Anthropic compatibility review feedback Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * moved all to `model` * fixes for azure ai search * Python: standardize remaining sample env var names Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: fix foundry-local pyright compatibility Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * updated env vars in cicd --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
120 lines
3.7 KiB
Markdown
120 lines
3.7 KiB
Markdown
# Multimodal Input Examples
|
|
|
|
This folder contains examples demonstrating how to send multimodal content (images, audio, PDF files) to AI agents using the Agent Framework.
|
|
|
|
## Examples
|
|
|
|
### OpenAI Chat Client
|
|
|
|
- **File**: `openai_chat_multimodal.py`
|
|
- **Description**: Shows how to send images, audio, and PDF files to OpenAI's Chat Completions API
|
|
- **Supported formats**: PNG/JPEG images, WAV/MP3 audio, PDF documents
|
|
|
|
### Azure OpenAI Chat Client
|
|
|
|
- **File**: `azure_chat_multimodal.py`
|
|
- **Description**: Shows how to send images to Azure OpenAI Chat Completions API
|
|
- **Supported formats**: PNG/JPEG images (PDF files are NOT supported by Chat Completions API)
|
|
|
|
### Azure OpenAI Responses Client
|
|
|
|
- **File**: `azure_responses_multimodal.py`
|
|
- **Description**: Shows how to send images and PDF files to Azure OpenAI Responses API
|
|
- **Supported formats**: PNG/JPEG images, PDF documents (full multimodal support)
|
|
|
|
## Environment Variables
|
|
|
|
Set the following environment variables before running the examples:
|
|
|
|
**For OpenAI:**
|
|
- `OPENAI_API_KEY`: Your OpenAI API key
|
|
|
|
**For Azure OpenAI:**
|
|
|
|
- `AZURE_OPENAI_ENDPOINT`: Your Azure OpenAI endpoint
|
|
- `AZURE_OPENAI_MODEL`: The name of your Azure OpenAI chat model deployment
|
|
- `AZURE_OPENAI_MODEL`: The name of your Azure OpenAI responses model deployment
|
|
|
|
Optionally for Azure OpenAI:
|
|
- `AZURE_OPENAI_API_VERSION`: The API version to use (default is `2024-10-21`)
|
|
- `AZURE_OPENAI_API_KEY`: Your Azure OpenAI API key (if not using `AzureCliCredential`)
|
|
|
|
**Note:** You can also provide configuration directly in code instead of using environment variables:
|
|
```python
|
|
# Example: Pass the Foundry project endpoint directly
|
|
client = FoundryChatClient(
|
|
credential=AzureCliCredential(),
|
|
project_endpoint="https://your-project.services.ai.azure.com",
|
|
model="your-deployment-name",
|
|
)
|
|
```
|
|
|
|
## Authentication
|
|
|
|
The Azure example uses `AzureCliCredential` for authentication. Run `az login` in your terminal before running the example, or replace `AzureCliCredential` with your preferred authentication method (e.g., provide `api_key` parameter).
|
|
|
|
## Running the Examples
|
|
|
|
```bash
|
|
# Run OpenAI example
|
|
python openai_chat_multimodal.py
|
|
|
|
# Run Azure Chat example (requires az login or API key)
|
|
python azure_chat_multimodal.py
|
|
|
|
# Run Azure Responses example (requires az login or API key)
|
|
python azure_responses_multimodal.py
|
|
```
|
|
|
|
## Using Your Own Files
|
|
|
|
The examples include small embedded test files for demonstration. To use your own files:
|
|
|
|
### Method 1: Data URIs (recommended)
|
|
|
|
```python
|
|
import base64
|
|
|
|
# Load and encode your file
|
|
with open("path/to/your/image.jpg", "rb") as f:
|
|
image_data = f.read()
|
|
image_base64 = base64.b64encode(image_data).decode('utf-8')
|
|
image_uri = f"data:image/jpeg;base64,{image_base64}"
|
|
|
|
# Use in DataContent
|
|
Content.from_uri(
|
|
uri=image_uri,
|
|
media_type="image/jpeg"
|
|
)
|
|
```
|
|
|
|
### Method 2: Raw bytes
|
|
|
|
```python
|
|
# Load raw bytes
|
|
with open("path/to/your/image.jpg", "rb") as f:
|
|
image_bytes = f.read()
|
|
|
|
# Use in DataContent
|
|
Content.from_data(
|
|
data=image_bytes,
|
|
media_type="image/jpeg"
|
|
)
|
|
```
|
|
|
|
## Supported File Types
|
|
|
|
| Type | Formats | Notes |
|
|
| --------- | -------------------- | ------------------------------ |
|
|
| Images | PNG, JPEG, GIF, WebP | Most common image formats |
|
|
| Audio | WAV, MP3 | For transcription and analysis |
|
|
| Documents | PDF | Text extraction and analysis |
|
|
|
|
## API Differences
|
|
|
|
- **OpenAI Chat Completions API**: Supports images, audio, and PDF files
|
|
- **Azure OpenAI Chat Completions API**: Supports images only (no PDF/audio file types)
|
|
- **Azure OpenAI Responses API**: Supports images and PDF files (full multimodal support)
|
|
|
|
Choose the appropriate client based on your multimodal needs and available APIs.
|