mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
a2856d3b92
* restructure: Python samples into progressive 01-05 layout - 01-get-started/: 6 numbered steps (hello agent → hosting) - 02-agents/: all agent concept samples (tools, middleware, providers, etc.) - 03-workflows/: ALL existing workflow samples preserved as-is - 04-hosting/: azure-functions, durabletask, a2a - 05-end-to-end/: demos, evaluation, hosted agents - Old files moved to _to_delete/ for review - Added AGENTS.md with structure documentation - autogen-migration/ and semantic-kernel-migration/ preserved at root * fix: switch to AzureOpenAI Foundry, fix CI failures - Switch all 01-get-started samples to AzureOpenAIResponsesClient with Azure AI Foundry project endpoint (AZURE_AI_PROJECT_ENDPOINT + AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME + AzureCliCredential) - Add _to_delete/ and 05-end-to-end/ to pyrightconfig.samples.json excludes - Fix test paths in packages/ that referenced old getting_started/ dirs: durabletask conftest + streaming test, azurefunctions conftest, devui conftest + capture_messages + openai_sdk_integration - Fix workflow_as_agent_human_in_the_loop.py import (sibling import) - Update hosting READMEs and tool comment paths - Replace root README.md with new structure overview - Update AGENTS.md to document Azure OpenAI Foundry as default provider * cleanup: remove _to_delete folder, copy resource files to active dirs All files in _to_delete/ were either: - Exact duplicates of files in the new structure (240 files) - Same file with only comment path updates (100 files) - One import-fix diff (workflow_as_agent_human_in_the_loop.py) - One superseded minimal_sample.py Resource files (sample.pdf, countries.json, employees.pdf, weather.json) copied to 02-agents/sample_assets/ and 02-agents/resources/ since active samples reference them. * fix: address PR review comments, centralize resources, remove root duplicates - Fix type annotation in 04_memory.py (string union -> proper types) - Fix old sample paths in observability files - Fix grammar/spelling in observability samples - Move sample_assets/ and resources/ to shared/ folder - Remove 8 duplicate observability files from 02-agents root - Update resource path references in multimodal_input and provider samples * fix: update broken links from old getting_started paths to new structure - Update relative paths in READMEs: getting_started/ → 01-get-started/, 02-agents/, 03-workflows/, 04-hosting/, 05-end-to-end/ - Fix absolute GitHub URLs in package READMEs - Fix broken link in ollama package README * fix: convert absolute GitHub URLs to relative paths for link checker Absolute URLs to python/samples/ on main branch 404 until PR merges. Converted to relative paths that linkspector can verify locally. * fix: update link for handoff sample moved to orchestrations/ * fix: update chatkit-integration README path from demos/ to 05-end-to-end/ * fix: update broken links in orchestrations README to match flat directory structure
120 lines
3.8 KiB
Markdown
120 lines
3.8 KiB
Markdown
# Multimodal Input Examples
|
|
|
|
This folder contains examples demonstrating how to send multimodal content (images, audio, PDF files) to AI agents using the Agent Framework.
|
|
|
|
## Examples
|
|
|
|
### OpenAI Chat Client
|
|
|
|
- **File**: `openai_chat_multimodal.py`
|
|
- **Description**: Shows how to send images, audio, and PDF files to OpenAI's Chat Completions API
|
|
- **Supported formats**: PNG/JPEG images, WAV/MP3 audio, PDF documents
|
|
|
|
### Azure OpenAI Chat Client
|
|
|
|
- **File**: `azure_chat_multimodal.py`
|
|
- **Description**: Shows how to send images to Azure OpenAI Chat Completions API
|
|
- **Supported formats**: PNG/JPEG images (PDF files are NOT supported by Chat Completions API)
|
|
|
|
### Azure OpenAI Responses Client
|
|
|
|
- **File**: `azure_responses_multimodal.py`
|
|
- **Description**: Shows how to send images and PDF files to Azure OpenAI Responses API
|
|
- **Supported formats**: PNG/JPEG images, PDF documents (full multimodal support)
|
|
|
|
## Environment Variables
|
|
|
|
Set the following environment variables before running the examples:
|
|
|
|
**For OpenAI:**
|
|
- `OPENAI_API_KEY`: Your OpenAI API key
|
|
|
|
**For Azure OpenAI:**
|
|
|
|
- `AZURE_OPENAI_ENDPOINT`: Your Azure OpenAI endpoint
|
|
- `AZURE_OPENAI_CHAT_DEPLOYMENT_NAME`: The name of your Azure OpenAI chat model deployment
|
|
- `AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME`: The name of your Azure OpenAI responses model deployment
|
|
|
|
Optionally for Azure OpenAI:
|
|
- `AZURE_OPENAI_API_VERSION`: The API version to use (default is `2024-10-21`)
|
|
- `AZURE_OPENAI_API_KEY`: Your Azure OpenAI API key (if not using `AzureCliCredential`)
|
|
|
|
**Note:** You can also provide configuration directly in code instead of using environment variables:
|
|
```python
|
|
# Example: Pass deployment_name directly
|
|
client = AzureOpenAIChatClient(
|
|
credential=AzureCliCredential(),
|
|
deployment_name="your-deployment-name",
|
|
endpoint="https://your-resource.openai.azure.com"
|
|
)
|
|
```
|
|
|
|
## Authentication
|
|
|
|
The Azure example uses `AzureCliCredential` for authentication. Run `az login` in your terminal before running the example, or replace `AzureCliCredential` with your preferred authentication method (e.g., provide `api_key` parameter).
|
|
|
|
## Running the Examples
|
|
|
|
```bash
|
|
# Run OpenAI example
|
|
python openai_chat_multimodal.py
|
|
|
|
# Run Azure Chat example (requires az login or API key)
|
|
python azure_chat_multimodal.py
|
|
|
|
# Run Azure Responses example (requires az login or API key)
|
|
python azure_responses_multimodal.py
|
|
```
|
|
|
|
## Using Your Own Files
|
|
|
|
The examples include small embedded test files for demonstration. To use your own files:
|
|
|
|
### Method 1: Data URIs (recommended)
|
|
|
|
```python
|
|
import base64
|
|
|
|
# Load and encode your file
|
|
with open("path/to/your/image.jpg", "rb") as f:
|
|
image_data = f.read()
|
|
image_base64 = base64.b64encode(image_data).decode('utf-8')
|
|
image_uri = f"data:image/jpeg;base64,{image_base64}"
|
|
|
|
# Use in DataContent
|
|
Content.from_uri(
|
|
uri=image_uri,
|
|
media_type="image/jpeg"
|
|
)
|
|
```
|
|
|
|
### Method 2: Raw bytes
|
|
|
|
```python
|
|
# Load raw bytes
|
|
with open("path/to/your/image.jpg", "rb") as f:
|
|
image_bytes = f.read()
|
|
|
|
# Use in DataContent
|
|
Content.from_data(
|
|
data=image_bytes,
|
|
media_type="image/jpeg"
|
|
)
|
|
```
|
|
|
|
## Supported File Types
|
|
|
|
| Type | Formats | Notes |
|
|
| --------- | -------------------- | ------------------------------ |
|
|
| Images | PNG, JPEG, GIF, WebP | Most common image formats |
|
|
| Audio | WAV, MP3 | For transcription and analysis |
|
|
| Documents | PDF | Text extraction and analysis |
|
|
|
|
## API Differences
|
|
|
|
- **OpenAI Chat Completions API**: Supports images, audio, and PDF files
|
|
- **Azure OpenAI Chat Completions API**: Supports images only (no PDF/audio file types)
|
|
- **Azure OpenAI Responses API**: Supports images and PDF files (full multimodal support)
|
|
|
|
Choose the appropriate client based on your multimodal needs and available APIs.
|