# Get Started with Azure Content Understanding in Microsoft Agent Framework

Please install this package via pip:

```bash
pip install agent-framework-azure-contentunderstanding --pre
```

## Azure Content Understanding Integration

### Prerequisites

Before using this package, you need an Azure Content Understanding resource:

1. An active **Azure subscription** ([create one for free](https://azure.microsoft.com/pricing/purchase-options/azure-account))
2. A **Microsoft Foundry resource** created in a [supported region](https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support)
3. **Default model deployments** configured for your resource (GPT-4.1, GPT-4.1-mini, text-embedding-3-large)

Follow the [prerequisites section](https://learn.microsoft.com/azure/ai-services/content-understanding/quickstart/use-rest-api?tabs=portal%2Cdocument&pivots=programming-language-rest#prerequisites) in the Azure Content Understanding quickstart for setup instructions.

### Introduction

The Azure Content Understanding integration provides a context provider that automatically analyzes file attachments (documents, images, audio, video) using [Azure Content Understanding](https://learn.microsoft.com/azure/ai-services/content-understanding/) and injects structured results into the LLM context.

- **Document & image analysis**: State-of-the-art OCR with markdown extraction, table preservation, and structured field extraction — handles scanned PDFs, handwritten content, and complex layouts
- **Audio & video analysis**: Transcription, speaker diarization, and per-segment summaries
- **Background processing**: Configurable timeout with async background fallback for large files
- **file_search integration**: Optional vector store upload for token-efficient RAG on large documents

> Learn more about Azure Content Understanding capabilities at [https://learn.microsoft.com/azure/ai-services/content-understanding/](https://learn.microsoft.com/azure/ai-services/content-understanding/)

### Basic Usage Example

See the [samples directory](samples/) which demonstrates:

- Single PDF upload and Q&A ([01_document_qa](samples/01-get-started/01_document_qa.py))
- Multi-turn sessions with cached results ([02_multi_turn_session](samples/01-get-started/02_multi_turn_session.py))
- PDF + audio + video parallel analysis ([03_multimodal_chat](samples/01-get-started/03_multimodal_chat.py))
- Structured field extraction with prebuilt-invoice ([04_invoice_processing](samples/01-get-started/04_invoice_processing.py))
- CU extraction + OpenAI vector store RAG ([05_large_doc_file_search](samples/01-get-started/05_large_doc_file_search.py))
- Interactive web UI with DevUI ([02-devui](samples/02-devui/))

```python
import asyncio
from agent_framework import Agent, AgentSession, Message, Content
from agent_framework.foundry import FoundryChatClient
from agent_framework.foundry import ContentUnderstandingContextProvider
from azure.identity import AzureCliCredential

credential = AzureCliCredential()

cu = ContentUnderstandingContextProvider(
    endpoint="https://my-resource.cognitiveservices.azure.com/",
    credential=credential,
    max_wait=None,  # block until CU extraction completes before sending to LLM
)

client = FoundryChatClient(
    project_endpoint="https://your-project.services.ai.azure.com",
    model="gpt-4.1",
    credential=credential,
)

async def main():
    async with cu:
        agent = Agent(
            client=client,
            name="DocumentQA",
            instructions="You are a helpful document analyst.",
            context_providers=[cu],
        )
        session = AgentSession()

        response = await agent.run(
            Message(role="user", contents=[
                Content.from_text("What's on this invoice?"),
                Content.from_uri(
                    "https://raw.githubusercontent.com/Azure-Samples/"
                    "azure-ai-content-understanding-assets/main/document/invoice.pdf",
                    media_type="application/pdf",
                    additional_properties={"filename": "invoice.pdf"},
                ),
            ]),
            session=session,
        )
        print(response.text)

asyncio.run(main())
```

### Supported File Types

| Category | Types |
|----------|-------|
| Documents | PDF, DOCX, XLSX, PPTX, HTML, TXT, Markdown |
| Images | JPEG, PNG, TIFF, BMP |
| Audio | WAV, MP3, M4A, FLAC, OGG |
| Video | MP4, MOV, AVI, WebM |

For the complete list of supported file types and size limits, see [Azure Content Understanding service limits](https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits#input-file-limits).

### Environment Variables

The provider supports automatic endpoint resolution from environment variables.
When ``endpoint`` is not passed to the constructor, it is loaded from
``AZURE_CONTENTUNDERSTANDING_ENDPOINT``:

```python
# Endpoint auto-loaded from AZURE_CONTENTUNDERSTANDING_ENDPOINT env var
cu = ContentUnderstandingContextProvider(credential=credential)
```

Set these in your shell or in a `.env` file:

```bash
AZURE_CONTENTUNDERSTANDING_ENDPOINT=https://your-cu-resource.cognitiveservices.azure.com/
AZURE_AI_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4.1
```

You also need to be logged in with `az login` (for `AzureCliCredential`).

### Next steps

- Explore the [samples directory](samples/) for complete code examples
- Read the [Azure Content Understanding documentation](https://learn.microsoft.com/azure/ai-services/content-understanding/) for detailed service information
- Learn more about the [Microsoft Agent Framework](https://aka.ms/agent-framework)