# AGENTS.md — azure-contentunderstanding ## Package Overview `agent-framework-azure-contentunderstanding` integrates Azure Content Understanding (CU) into the Agent Framework as a context provider. It automatically analyzes file attachments (documents, images, audio, video) and injects structured results into the LLM context. ## Public API | Symbol | Type | Description | |--------|------|-------------| | `ContentUnderstandingContextProvider` | class | Main context provider — extends `ContextProvider` | | `AnalysisSection` | enum | Output section selector (MARKDOWN, FIELDS, etc.) | | `DocumentStatus` | enum | Document lifecycle state (ANALYZING, UPLOADING, READY, FAILED) | | `FileSearchBackend` | ABC | Abstract vector store file operations interface | | `FileSearchConfig` | dataclass | Configuration for CU + vector store RAG mode | ## Architecture - **`_context_provider.py`** — Main provider implementation. Overrides `before_run()` to detect file attachments, call the CU API, manage session state with multi-document tracking, and auto-register retrieval tools for follow-up turns. - **Analyzer auto-detection** — When `analyzer_id=None` (default), `_resolve_analyzer_id()` selects the CU analyzer based on media type prefix: `audio/` → `prebuilt-audioSearch`, `video/` → `prebuilt-videoSearch`, everything else → `prebuilt-documentSearch`. - **Multi-segment output** — CU splits long video/audio into multiple scene segments (each a separate `contents[]` entry with its own `startTimeMs`, `endTimeMs`, `markdown`, and `fields`). `_extract_sections()` produces: - `segments`: list of per-segment dicts, each with `markdown`, `fields`, `start_time_s`, `end_time_s` - `markdown`: concatenated at top level with `---` separators (for file_search uploads) - `duration_seconds`: computed from global `min(startTimeMs)` → `max(endTimeMs)` - Metadata (`kind`, `resolution`): taken from the first segment - **Speaker diarization (not identification)** — CU transcripts label speakers as ``, ``, etc. CU does **not** identify speakers by name. - **file_search RAG** — When `FileSearchConfig` is provided, CU-extracted markdown is uploaded to an OpenAI vector store and a `file_search` tool is registered on the context instead of injecting the full document content. This enables token-efficient retrieval for large documents. - **`_models.py`** — `AnalysisSection` enum, `DocumentStatus` enum, `DocumentEntry` TypedDict, `FileSearchConfig` dataclass. - **`_file_search.py`** — `FileSearchBackend` ABC, `OpenAIFileSearchBackend`, `FoundryFileSearchBackend`. ## Key Patterns - Follows the Azure AI Search context provider pattern (same lifecycle, config style). - Uses provider-scoped `state` dict for multi-document tracking across turns. - Auto-registers `list_documents()` tool via `context.extend_tools()`. - Configurable timeout (`max_wait`) with `asyncio.create_task()` background fallback. - Strips supported binary attachments from `input_messages` to prevent LLM API errors. - Explicit `analyzer_id` always overrides auto-detection (user preference wins). - Vector store resources are cleaned up in `close()` / `__aexit__`. ## Samples | Sample | Description | |--------|-------------| | `01_document_qa.py` | Upload a PDF via URL, ask questions about it | | `02_multi_turn_session.py` | AgentSession persistence across turns | | `03_multimodal_chat.py` | PDF + audio + video parallel analysis | | `04_invoice_processing.py` | Structured field extraction with `prebuilt-invoice` analyzer | | `05_large_doc_file_search.py` | CU extraction + OpenAI vector store RAG | | `02-devui/01-multimodal_agent/` | DevUI web UI for CU-powered chat | | `02-devui/02-file_search_agent/` | DevUI web UI combining CU + file_search RAG | ## Running Tests ```bash uv run poe test -P azure-contentunderstanding ```