Python: Add Cosmos DB NoSQL Checkpoint Storage for Python Workflows (#4916)

* Add CosmosCheckpointStorage for Python workflow checkpointing

Add native Cosmos DB NoSQL support for workflow checkpoint storage in the
Python agent-framework-azure-cosmos package, achieving parity with the
existing .NET CosmosCheckpointStore.

New files:
- _checkpoint_storage.py: CosmosCheckpointStorage implementing the
  CheckpointStorage protocol with 6 methods (save, load, list_checkpoints,
  delete, get_latest, list_checkpoint_ids)
- test_cosmos_checkpoint_storage.py: Unit and integration tests
- workflow_checkpointing.py: Sample demonstrating Cosmos DB-backed
  workflow checkpoint/resume

Auth support:
- Managed identity / RBAC via Azure credential objects
  (DefaultAzureCredential, ManagedIdentityCredential, etc.)
- Key-based auth via account key string or AZURE_COSMOS_KEY env var
- Pre-created CosmosClient or ContainerProxy

Key design decisions:
- Partition key: /workflow_name for efficient per-workflow queries
- Serialization: Reuses encode/decode_checkpoint_value for full Python
  object fidelity (hybrid JSON + pickle approach)
- Container auto-creation via create_container_if_not_exists

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Adding cosmos checkpointer

* Resolving comments

* Fixing builds

* Adding sample for history provider and checkpoint storage

* Resolving comments

* fixing builds

* Resolving comments

---------

Co-authored-by: Aayush Kataria <aayushkataria@Aayushs-MacBook-Pro-2.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
This commit is contained in:
Aayush Kataria
2026-04-08 22:01:41 -07:00
committed by GitHub
Unverified
parent a7a02c1abd
commit 30a2bc3dcb
11 changed files with 1989 additions and 8 deletions
@@ -9,6 +9,9 @@ These samples demonstrate different approaches to managing conversation history
| [`suspend_resume_session.py`](suspend_resume_session.py) | Suspend and resume conversation sessions, comparing service-managed sessions (Azure AI Foundry) with in-memory sessions (OpenAI). |
| [`custom_history_provider.py`](custom_history_provider.py) | Implement a custom history provider by extending `HistoryProvider`, enabling conversation persistence in your preferred storage backend. |
| [`cosmos_history_provider.py`](cosmos_history_provider.py) | Use Azure Cosmos DB as a history provider for durable conversation storage with `CosmosHistoryProvider`. |
| [`cosmos_history_provider_conversation_persistence.py`](cosmos_history_provider_conversation_persistence.py) | Persist and resume conversations across application restarts using `CosmosHistoryProvider` — serialize session state, restore it, and continue with full Cosmos DB history. |
| [`cosmos_history_provider_messages.py`](cosmos_history_provider_messages.py) | Direct message history operations — retrieve stored messages as a transcript, clear session history, and verify data deletion. |
| [`cosmos_history_provider_sessions.py`](cosmos_history_provider_sessions.py) | Multi-session and multi-tenant management — per-tenant session isolation, `list_sessions()` to enumerate, switch between sessions, and resume specific conversations. |
| [`redis_history_provider.py`](redis_history_provider.py) | Use Redis as a history provider for persistent conversation history storage across sessions. |
## Prerequisites
@@ -22,7 +25,7 @@ These samples demonstrate different approaches to managing conversation history
**For `custom_history_provider.py`:**
- `OPENAI_API_KEY`: Your OpenAI API key
**For `cosmos_history_provider.py`:**
**For Cosmos DB samples (`cosmos_history_provider*.py`):**
- `FOUNDRY_PROJECT_ENDPOINT`: Your Azure AI Foundry project endpoint
- `FOUNDRY_MODEL`: The Foundry model deployment name
- `AZURE_COSMOS_ENDPOINT`: Your Azure Cosmos DB account endpoint
@@ -0,0 +1,165 @@
# Copyright (c) Microsoft. All rights reserved.
# ruff: noqa: T201
import asyncio
import os
from agent_framework import Agent, AgentSession
from agent_framework.foundry import FoundryChatClient
from agent_framework_azure_cosmos import CosmosHistoryProvider
from azure.identity.aio import AzureCliCredential
from dotenv import load_dotenv
# Load environment variables from .env file.
load_dotenv()
"""
This sample demonstrates persisting and resuming conversations across application
restarts using CosmosHistoryProvider as the persistent backend.
Key components:
- Phase 1: Run a conversation and serialize the session with session.to_dict()
- Phase 2: Simulate an app restart — create new provider and agent instances,
restore the session with AgentSession.from_dict(), and continue the conversation
- Cosmos DB reloads the full message history, so the agent remembers everything
Environment variables:
FOUNDRY_PROJECT_ENDPOINT
FOUNDRY_MODEL
AZURE_COSMOS_ENDPOINT
AZURE_COSMOS_DATABASE_NAME
AZURE_COSMOS_CONTAINER_NAME
Optional:
AZURE_COSMOS_KEY
"""
async def main() -> None:
"""Run the conversation persistence sample."""
project_endpoint = os.getenv("FOUNDRY_PROJECT_ENDPOINT")
model = os.getenv("FOUNDRY_MODEL")
cosmos_endpoint = os.getenv("AZURE_COSMOS_ENDPOINT")
cosmos_database_name = os.getenv("AZURE_COSMOS_DATABASE_NAME")
cosmos_container_name = os.getenv("AZURE_COSMOS_CONTAINER_NAME")
cosmos_key = os.getenv("AZURE_COSMOS_KEY")
if (
not project_endpoint
or not model
or not cosmos_endpoint
or not cosmos_database_name
or not cosmos_container_name
):
print(
"Please set FOUNDRY_PROJECT_ENDPOINT, FOUNDRY_MODEL, "
"AZURE_COSMOS_ENDPOINT, AZURE_COSMOS_DATABASE_NAME, and AZURE_COSMOS_CONTAINER_NAME."
)
return
# ── Phase 1: Initial conversation ──
print("=== Phase 1: Initial conversation ===\n")
async with (
AzureCliCredential() as credential,
CosmosHistoryProvider(
endpoint=cosmos_endpoint,
database_name=cosmos_database_name,
container_name=cosmos_container_name,
credential=cosmos_key or credential,
) as history_provider,
Agent(
client=FoundryChatClient(
project_endpoint=project_endpoint,
model=model,
credential=credential,
),
name="PersistentAgent",
instructions="You are a helpful assistant that remembers prior turns.",
context_providers=[history_provider],
default_options={"store": False},
) as agent,
):
session = agent.create_session()
response1 = await agent.run(
"My name is Ada. I'm building a distributed database in Rust.", session=session
)
print("User: My name is Ada. I'm building a distributed database in Rust.")
print(f"Assistant: {response1.text}\n")
response2 = await agent.run("The hardest part is the consensus algorithm.", session=session)
print("User: The hardest part is the consensus algorithm.")
print(f"Assistant: {response2.text}\n")
serialized_session = session.to_dict()
print(f"Session serialized. Session ID: {session.session_id}")
# ── Phase 2: Simulate app restart ──
print("\n=== Phase 2: Resuming after 'restart' ===\n")
async with (
AzureCliCredential() as credential,
CosmosHistoryProvider(
endpoint=cosmos_endpoint,
database_name=cosmos_database_name,
container_name=cosmos_container_name,
credential=cosmos_key or credential,
) as history_provider,
Agent(
client=FoundryChatClient(
project_endpoint=project_endpoint,
model=model,
credential=credential,
),
name="PersistentAgent",
instructions="You are a helpful assistant that remembers prior turns.",
context_providers=[history_provider],
default_options={"store": False},
) as agent,
):
restored_session = AgentSession.from_dict(serialized_session)
print(f"Session restored. Session ID: {restored_session.session_id}\n")
response3 = await agent.run("What was I working on and what was the challenge?", session=restored_session)
print("User: What was I working on and what was the challenge?")
print(f"Assistant: {response3.text}\n")
messages = await history_provider.get_messages(restored_session.session_id)
print(f"Messages stored in Cosmos DB: {len(messages)}")
for i, msg in enumerate(messages, 1):
print(f" {i}. [{msg.role}] {msg.text[:80]}...")
if __name__ == "__main__":
asyncio.run(main())
"""
Sample output:
=== Phase 1: Initial conversation ===
User: My name is Ada. I'm building a distributed database in Rust.
Assistant: That sounds like a great project, Ada! Rust is an excellent choice for ...
User: The hardest part is the consensus algorithm.
Assistant: Consensus algorithms can be tricky! Are you looking at Raft, Paxos, or ...
Session serialized. Session ID: <session-uuid>
=== Phase 2: Resuming after 'restart' ===
Session restored. Session ID: <session-uuid>
User: What was I working on and what was the challenge?
Assistant: You told me you're building a distributed database in Rust and that the hardest
part is the consensus algorithm.
Messages stored in Cosmos DB: 6
1. [user] My name is Ada. I'm building a distributed database in Rust....
2. [assistant] That sounds like a great project, Ada! Rust is an excellent ch...
3. [user] The hardest part is the consensus algorithm....
4. [assistant] Consensus algorithms can be tricky! Are you looking at Raft, Pa...
5. [user] What was I working on and what was the challenge?...
6. [assistant] You told me you're building a distributed database in Rust and ...
"""
@@ -0,0 +1,157 @@
# Copyright (c) Microsoft. All rights reserved.
# ruff: noqa: T201
import asyncio
import os
from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient
from agent_framework_azure_cosmos import CosmosHistoryProvider
from azure.identity.aio import AzureCliCredential
from dotenv import load_dotenv
# Load environment variables from .env file.
load_dotenv()
"""
This sample demonstrates direct message history operations using
CosmosHistoryProvider — retrieving, displaying, and clearing stored messages.
Key components:
- get_messages(session_id): Retrieve all stored messages as a chat transcript
- clear(session_id): Delete all messages for a session (e.g., GDPR compliance)
- Verifying that history is empty after clearing
- Running a new conversation in the same session after clearing
Environment variables:
FOUNDRY_PROJECT_ENDPOINT
FOUNDRY_MODEL
AZURE_COSMOS_ENDPOINT
AZURE_COSMOS_DATABASE_NAME
AZURE_COSMOS_CONTAINER_NAME
Optional:
AZURE_COSMOS_KEY
"""
async def main() -> None:
"""Run the messages history sample."""
project_endpoint = os.getenv("FOUNDRY_PROJECT_ENDPOINT")
model = os.getenv("FOUNDRY_MODEL")
cosmos_endpoint = os.getenv("AZURE_COSMOS_ENDPOINT")
cosmos_database_name = os.getenv("AZURE_COSMOS_DATABASE_NAME")
cosmos_container_name = os.getenv("AZURE_COSMOS_CONTAINER_NAME")
cosmos_key = os.getenv("AZURE_COSMOS_KEY")
if (
not project_endpoint
or not model
or not cosmos_endpoint
or not cosmos_database_name
or not cosmos_container_name
):
print(
"Please set FOUNDRY_PROJECT_ENDPOINT, FOUNDRY_MODEL, "
"AZURE_COSMOS_ENDPOINT, AZURE_COSMOS_DATABASE_NAME, and AZURE_COSMOS_CONTAINER_NAME."
)
return
async with (
AzureCliCredential() as credential,
CosmosHistoryProvider(
endpoint=cosmos_endpoint,
database_name=cosmos_database_name,
container_name=cosmos_container_name,
credential=cosmos_key or credential,
) as history_provider,
Agent(
client=FoundryChatClient(
project_endpoint=project_endpoint,
model=model,
credential=credential,
),
name="HistoryAgent",
instructions="You are a helpful assistant that remembers prior turns.",
context_providers=[history_provider],
default_options={"store": False},
) as agent,
):
session = agent.create_session()
session_id = session.session_id
# 1. Have a multi-turn conversation.
print("=== Building a conversation ===\n")
queries = [
"Hi! My favorite programming language is Python.",
"I also enjoy hiking in the mountains on weekends.",
"What do you know about me so far?",
]
for query in queries:
response = await agent.run(query, session=session)
print(f"User: {query}")
print(f"Assistant: {response.text}\n")
# 2. Retrieve and display the full message history as a transcript.
print("=== Chat transcript from Cosmos DB ===\n")
messages = await history_provider.get_messages(session_id)
print(f"Total messages stored: {len(messages)}\n")
for i, msg in enumerate(messages, 1):
print(f" {i}. [{msg.role}] {msg.text[:100]}")
# 3. Clear the session history.
print("\n=== Clearing session history ===\n")
await history_provider.clear(session_id)
print(f"Cleared all messages for session: {session_id}")
# 4. Verify history is empty.
remaining = await history_provider.get_messages(session_id)
print(f"Messages after clear: {len(remaining)}")
# 5. Start a fresh conversation in the same session — agent has no memory.
print("\n=== Fresh conversation (same session, no memory) ===\n")
response = await agent.run("What do you know about me?", session=session)
print("User: What do you know about me?")
print(f"Assistant: {response.text}")
if __name__ == "__main__":
asyncio.run(main())
"""
Sample output:
=== Building a conversation ===
User: Hi! My favorite programming language is Python.
Assistant: That's great! Python is a wonderful language. What do you like most about it?
User: I also enjoy hiking in the mountains on weekends.
Assistant: Hiking sounds lovely! Do you have a favorite trail or mountain range?
User: What do you know about me so far?
Assistant: You love Python as your favorite programming language and enjoy hiking in the mountains on weekends.
=== Chat transcript from Cosmos DB ===
Total messages stored: 6
1. [user] Hi! My favorite programming language is Python.
2. [assistant] That's great! Python is a wonderful language. What do you like most about it?
3. [user] I also enjoy hiking in the mountains on weekends.
4. [assistant] Hiking sounds lovely! Do you have a favorite trail or mountain range?
5. [user] What do you know about me so far?
6. [assistant] You love Python as your favorite programming language and enjoy hiking ...
=== Clearing session history ===
Cleared all messages for session: <session-uuid>
Messages after clear: 0
=== Fresh conversation (same session, no memory) ===
User: What do you know about me?
Assistant: I don't have any information about you yet. Feel free to share anything you'd like!
"""
@@ -0,0 +1,197 @@
# Copyright (c) Microsoft. All rights reserved.
# ruff: noqa: T201
import asyncio
import os
from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient
from agent_framework_azure_cosmos import CosmosHistoryProvider
from azure.identity.aio import AzureCliCredential
from dotenv import load_dotenv
# Load environment variables from .env file.
load_dotenv()
"""
This sample demonstrates multi-session and multi-tenant management using
CosmosHistoryProvider. Each tenant (user) gets isolated conversation sessions
stored in the same Cosmos DB container, partitioned by session_id.
Key components:
- Per-tenant session isolation using prefixed session IDs
- list_sessions(): Enumerate all stored sessions across tenants
- Switching between sessions for different users
- Resuming a specific user's session — verifying data isolation
Environment variables:
FOUNDRY_PROJECT_ENDPOINT
FOUNDRY_MODEL
AZURE_COSMOS_ENDPOINT
AZURE_COSMOS_DATABASE_NAME
AZURE_COSMOS_CONTAINER_NAME
Optional:
AZURE_COSMOS_KEY
"""
async def main() -> None:
"""Run the session management sample."""
project_endpoint = os.getenv("FOUNDRY_PROJECT_ENDPOINT")
model = os.getenv("FOUNDRY_MODEL")
cosmos_endpoint = os.getenv("AZURE_COSMOS_ENDPOINT")
cosmos_database_name = os.getenv("AZURE_COSMOS_DATABASE_NAME")
cosmos_container_name = os.getenv("AZURE_COSMOS_CONTAINER_NAME")
cosmos_key = os.getenv("AZURE_COSMOS_KEY")
if (
not project_endpoint
or not model
or not cosmos_endpoint
or not cosmos_database_name
or not cosmos_container_name
):
print(
"Please set FOUNDRY_PROJECT_ENDPOINT, FOUNDRY_MODEL, "
"AZURE_COSMOS_ENDPOINT, AZURE_COSMOS_DATABASE_NAME, and AZURE_COSMOS_CONTAINER_NAME."
)
return
async with (
AzureCliCredential() as credential,
CosmosHistoryProvider(
endpoint=cosmos_endpoint,
database_name=cosmos_database_name,
container_name=cosmos_container_name,
credential=cosmos_key or credential,
) as history_provider,
Agent(
client=FoundryChatClient(
project_endpoint=project_endpoint,
model=model,
credential=credential,
),
name="MultiTenantAgent",
instructions="You are a helpful assistant that remembers prior turns.",
context_providers=[history_provider],
default_options={"store": False},
) as agent,
):
# 1. Tenant "alice" starts a conversation about travel.
print("=== Tenant: Alice — Travel conversation ===\n")
alice_session = agent.create_session(session_id="tenant-alice-session-1")
response = await agent.run(
"Hi! I'm planning a trip to Italy. I love Renaissance art.", session=alice_session
)
print("Alice: I'm planning a trip to Italy. I love Renaissance art.")
print(f"Assistant: {response.text}\n")
response = await agent.run("Which museums should I visit in Florence?", session=alice_session)
print("Alice: Which museums should I visit in Florence?")
print(f"Assistant: {response.text}\n")
# 2. Tenant "bob" starts a separate conversation about cooking.
print("=== Tenant: Bob — Cooking conversation ===\n")
bob_session = agent.create_session(session_id="tenant-bob-session-1")
response = await agent.run(
"Hey! I'm learning to cook Thai food. I just made pad thai.", session=bob_session
)
print("Bob: I'm learning to cook Thai food. I just made pad thai.")
print(f"Assistant: {response.text}\n")
response = await agent.run("What Thai dish should I try next?", session=bob_session)
print("Bob: What Thai dish should I try next?")
print(f"Assistant: {response.text}\n")
# 3. List all sessions stored in Cosmos DB.
print("=== Listing all sessions ===\n")
sessions = await history_provider.list_sessions()
print(f"Found {len(sessions)} session(s):")
for sid in sessions:
print(f" - {sid}")
# 4. Resume Alice's session — verify she gets her travel context back.
print("\n=== Resuming Alice's session ===\n")
alice_resumed = agent.create_session(session_id="tenant-alice-session-1")
response = await agent.run("What were we discussing?", session=alice_resumed)
print("Alice: What were we discussing?")
print(f"Assistant: {response.text}\n")
# 5. Resume Bob's session — verify he gets his cooking context back.
print("=== Resuming Bob's session ===\n")
bob_resumed = agent.create_session(session_id="tenant-bob-session-1")
response = await agent.run("What was the last dish I mentioned?", session=bob_resumed)
print("Bob: What was the last dish I mentioned?")
print(f"Assistant: {response.text}\n")
# 6. Show per-session message counts.
print("=== Per-session message counts ===\n")
alice_messages = await history_provider.get_messages("tenant-alice-session-1")
bob_messages = await history_provider.get_messages("tenant-bob-session-1")
print(f"Alice's session: {len(alice_messages)} messages")
print(f"Bob's session: {len(bob_messages)} messages")
# 7. Clean up: clear both sessions.
print("\n=== Cleaning up ===\n")
await history_provider.clear("tenant-alice-session-1")
await history_provider.clear("tenant-bob-session-1")
print("Cleared Alice's and Bob's sessions.")
if __name__ == "__main__":
asyncio.run(main())
"""
Sample output:
=== Tenant: Alice — Travel conversation ===
Alice: I'm planning a trip to Italy. I love Renaissance art.
Assistant: Italy is a dream for Renaissance art lovers! Florence, Rome, and Venice ...
Alice: Which museums should I visit in Florence?
Assistant: In Florence, the Uffizi Gallery is a must — it has Botticelli's Birth of Venus ...
=== Tenant: Bob — Cooking conversation ===
Bob: I'm learning to cook Thai food. I just made pad thai.
Assistant: Pad thai is a great start! How did it turn out?
Bob: What Thai dish should I try next?
Assistant: I'd suggest trying green curry or tom yum soup — both are classic Thai dishes ...
=== Listing all sessions ===
Found 2 session(s):
- tenant-alice-session-1
- tenant-bob-session-1
=== Resuming Alice's session ===
Alice: What were we discussing?
Assistant: We were discussing your trip to Italy and your love for Renaissance art ...
=== Resuming Bob's session ===
Bob: What was the last dish I mentioned?
Assistant: You mentioned pad thai — it was the dish you just made!
=== Per-session message counts ===
Alice's session: 6 messages
Bob's session: 6 messages
=== Cleaning up ===
Cleared Alice's and Bob's sessions.
"""