Files
Eduard van Valkenburg a2856d3b92 Python: restructure: Python samples into progressive 01-05 layout (#3862)
* restructure: Python samples into progressive 01-05 layout

- 01-get-started/: 6 numbered steps (hello agent → hosting)
- 02-agents/: all agent concept samples (tools, middleware, providers, etc.)
- 03-workflows/: ALL existing workflow samples preserved as-is
- 04-hosting/: azure-functions, durabletask, a2a
- 05-end-to-end/: demos, evaluation, hosted agents
- Old files moved to _to_delete/ for review
- Added AGENTS.md with structure documentation
- autogen-migration/ and semantic-kernel-migration/ preserved at root

* fix: switch to AzureOpenAI Foundry, fix CI failures

- Switch all 01-get-started samples to AzureOpenAIResponsesClient with
  Azure AI Foundry project endpoint (AZURE_AI_PROJECT_ENDPOINT +
  AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME + AzureCliCredential)
- Add _to_delete/ and 05-end-to-end/ to pyrightconfig.samples.json excludes
- Fix test paths in packages/ that referenced old getting_started/ dirs:
  durabletask conftest + streaming test, azurefunctions conftest,
  devui conftest + capture_messages + openai_sdk_integration
- Fix workflow_as_agent_human_in_the_loop.py import (sibling import)
- Update hosting READMEs and tool comment paths
- Replace root README.md with new structure overview
- Update AGENTS.md to document Azure OpenAI Foundry as default provider

* cleanup: remove _to_delete folder, copy resource files to active dirs

All files in _to_delete/ were either:
- Exact duplicates of files in the new structure (240 files)
- Same file with only comment path updates (100 files)
- One import-fix diff (workflow_as_agent_human_in_the_loop.py)
- One superseded minimal_sample.py

Resource files (sample.pdf, countries.json, employees.pdf, weather.json)
copied to 02-agents/sample_assets/ and 02-agents/resources/ since active
samples reference them.

* fix: address PR review comments, centralize resources, remove root duplicates

- Fix type annotation in 04_memory.py (string union -> proper types)
- Fix old sample paths in observability files
- Fix grammar/spelling in observability samples
- Move sample_assets/ and resources/ to shared/ folder
- Remove 8 duplicate observability files from 02-agents root
- Update resource path references in multimodal_input and provider samples

* fix: update broken links from old getting_started paths to new structure

- Update relative paths in READMEs: getting_started/ → 01-get-started/,
  02-agents/, 03-workflows/, 04-hosting/, 05-end-to-end/
- Fix absolute GitHub URLs in package READMEs
- Fix broken link in ollama package README

* fix: convert absolute GitHub URLs to relative paths for link checker

Absolute URLs to python/samples/ on main branch 404 until PR merges.
Converted to relative paths that linkspector can verify locally.

* fix: update link for handoff sample moved to orchestrations/

* fix: update chatkit-integration README path from demos/ to 05-end-to-end/

* fix: update broken links in orchestrations README to match flat directory structure
2026-02-12 17:36:36 +00:00

120 lines
3.8 KiB
Markdown

# Multimodal Input Examples
This folder contains examples demonstrating how to send multimodal content (images, audio, PDF files) to AI agents using the Agent Framework.
## Examples
### OpenAI Chat Client
- **File**: `openai_chat_multimodal.py`
- **Description**: Shows how to send images, audio, and PDF files to OpenAI's Chat Completions API
- **Supported formats**: PNG/JPEG images, WAV/MP3 audio, PDF documents
### Azure OpenAI Chat Client
- **File**: `azure_chat_multimodal.py`
- **Description**: Shows how to send images to Azure OpenAI Chat Completions API
- **Supported formats**: PNG/JPEG images (PDF files are NOT supported by Chat Completions API)
### Azure OpenAI Responses Client
- **File**: `azure_responses_multimodal.py`
- **Description**: Shows how to send images and PDF files to Azure OpenAI Responses API
- **Supported formats**: PNG/JPEG images, PDF documents (full multimodal support)
## Environment Variables
Set the following environment variables before running the examples:
**For OpenAI:**
- `OPENAI_API_KEY`: Your OpenAI API key
**For Azure OpenAI:**
- `AZURE_OPENAI_ENDPOINT`: Your Azure OpenAI endpoint
- `AZURE_OPENAI_CHAT_DEPLOYMENT_NAME`: The name of your Azure OpenAI chat model deployment
- `AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME`: The name of your Azure OpenAI responses model deployment
Optionally for Azure OpenAI:
- `AZURE_OPENAI_API_VERSION`: The API version to use (default is `2024-10-21`)
- `AZURE_OPENAI_API_KEY`: Your Azure OpenAI API key (if not using `AzureCliCredential`)
**Note:** You can also provide configuration directly in code instead of using environment variables:
```python
# Example: Pass deployment_name directly
client = AzureOpenAIChatClient(
credential=AzureCliCredential(),
deployment_name="your-deployment-name",
endpoint="https://your-resource.openai.azure.com"
)
```
## Authentication
The Azure example uses `AzureCliCredential` for authentication. Run `az login` in your terminal before running the example, or replace `AzureCliCredential` with your preferred authentication method (e.g., provide `api_key` parameter).
## Running the Examples
```bash
# Run OpenAI example
python openai_chat_multimodal.py
# Run Azure Chat example (requires az login or API key)
python azure_chat_multimodal.py
# Run Azure Responses example (requires az login or API key)
python azure_responses_multimodal.py
```
## Using Your Own Files
The examples include small embedded test files for demonstration. To use your own files:
### Method 1: Data URIs (recommended)
```python
import base64
# Load and encode your file
with open("path/to/your/image.jpg", "rb") as f:
image_data = f.read()
image_base64 = base64.b64encode(image_data).decode('utf-8')
image_uri = f"data:image/jpeg;base64,{image_base64}"
# Use in DataContent
Content.from_uri(
uri=image_uri,
media_type="image/jpeg"
)
```
### Method 2: Raw bytes
```python
# Load raw bytes
with open("path/to/your/image.jpg", "rb") as f:
image_bytes = f.read()
# Use in DataContent
Content.from_data(
data=image_bytes,
media_type="image/jpeg"
)
```
## Supported File Types
| Type | Formats | Notes |
| --------- | -------------------- | ------------------------------ |
| Images | PNG, JPEG, GIF, WebP | Most common image formats |
| Audio | WAV, MP3 | For transcription and analysis |
| Documents | PDF | Text extraction and analysis |
## API Differences
- **OpenAI Chat Completions API**: Supports images, audio, and PDF files
- **Azure OpenAI Chat Completions API**: Supports images only (no PDF/audio file types)
- **Azure OpenAI Responses API**: Supports images and PDF files (full multimodal support)
Choose the appropriate client based on your multimodal needs and available APIs.