mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
Python: Implement annotation-based context compaction (#4469)
* Implement annotation-based context compaction Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Handle missing compaction attributes in BaseChatClient Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix CI typing and bandit issues Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Optimize incremental compaction annotation pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refinement * Python: add ToolResultCompactionStrategy and CompactionProvider Add ToolResultCompactionStrategy that collapses older tool-call groups into short summary messages (e.g. [Tool calls: get_weather]) while keeping the most recent groups verbatim. This mirrors the .NET ToolResultCompactionStrategy from PR #4533. Add CompactionProvider as a context-provider that auto-applies compaction before each agent turn and stores compacted history in session state after each turn. Includes tests and samples for both features. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refinement and alignment with dotnet PR * updated tool result compaction * updated tool result compaction * Python: add ToolResultCompactionStrategy, CompactionProvider, and skip_excluded - ToolResultCompactionStrategy collapses older tool-call groups into [Tool results: func_name: result] summaries with bidirectional tracing (same pattern as SummarizationStrategy). - CompactionProvider as BaseContextProvider with separate before_strategy and after_strategy parameters. before_strategy compacts loaded context; after_strategy compacts stored history via history_source_id. - InMemoryHistoryProvider gains skip_excluded flag to filter out messages marked as excluded by compaction strategies. - Tests, samples, and exports updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fixed checks * fix mypy * Fix: ensure summary messages from both strategies get full compaction annotations SummarizationStrategy was not calling annotate_message_groups after inserting its summary message, so the summary lacked core group annotations (id, kind, index, has_reasoning, _excluded). Added the missing call. ToolResultCompactionStrategy already had it. Added tests verifying both strategies produce fully annotated summaries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * updated propagation * fix mypy --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
Unverified
parent
565c0b1623
commit
3e03a305f6
@@ -0,0 +1,23 @@
|
||||
# Context Compaction Samples
|
||||
|
||||
This folder demonstrates context compaction patterns introduced by ADR-0019.
|
||||
|
||||
## Files
|
||||
|
||||
- `basics.py` — builds a local message list and applies each built-in strategy one at a time.
|
||||
- `advanced.py` — composes multiple strategies with `TokenBudgetComposedStrategy`.
|
||||
- `agent_client_overrides.py` — shows client defaults, agent-level overrides, and per-run compaction overrides.
|
||||
- `custom.py` — defines a custom strategy implementing the `CompactionStrategy` protocol.
|
||||
- `tiktoken_tokenizer.py` — shows a `TokenizerProtocol` implementation backed by `tiktoken`.
|
||||
- `compaction_provider.py` — uses `CompactionProvider` with an agent and `InMemoryHistoryProvider`.
|
||||
|
||||
Run samples with:
|
||||
|
||||
```bash
|
||||
uv run samples/02-agents/compaction/basics.py
|
||||
uv run samples/02-agents/compaction/advanced.py
|
||||
uv run samples/02-agents/compaction/agent_client_overrides.py
|
||||
uv run samples/02-agents/compaction/custom.py
|
||||
uv run samples/02-agents/compaction/tiktoken_tokenizer.py
|
||||
uv run samples/02-agents/compaction/compaction_provider.py # requires OPENAI_API_KEY
|
||||
```
|
||||
@@ -0,0 +1,115 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
|
||||
import asyncio
|
||||
from typing import Any
|
||||
|
||||
from agent_framework import (
|
||||
CharacterEstimatorTokenizer,
|
||||
ChatResponse,
|
||||
Message,
|
||||
SelectiveToolCallCompactionStrategy,
|
||||
SlidingWindowStrategy,
|
||||
SummarizationStrategy,
|
||||
TokenBudgetComposedStrategy,
|
||||
annotate_message_groups,
|
||||
apply_compaction,
|
||||
included_token_count,
|
||||
)
|
||||
|
||||
"""This sample demonstrates composed in-run compaction with a token budget.
|
||||
|
||||
Key components:
|
||||
- TokenBudgetComposedStrategy
|
||||
- Sequential strategy composition
|
||||
- Summarization with a SupportsChatGetResponse-compatible summarizer client
|
||||
"""
|
||||
|
||||
|
||||
class BudgetSummaryClient:
|
||||
async def get_response(
|
||||
self,
|
||||
messages: list[Message],
|
||||
*,
|
||||
stream: bool = False,
|
||||
options: dict[str, Any] | None = None,
|
||||
**kwargs: Any,
|
||||
) -> ChatResponse:
|
||||
summary_text = f"Budget summary generated from {len(messages)} prompt messages."
|
||||
return ChatResponse(messages=[Message(role="assistant", text=summary_text)])
|
||||
|
||||
|
||||
def _build_long_history() -> list[Message]:
|
||||
history = [Message(role="system", text="You are a migration copilot.")]
|
||||
for i in range(1, 8):
|
||||
history.append(
|
||||
Message(
|
||||
role="user",
|
||||
text=f"Iteration {i}: capture migration requirements and edge cases.",
|
||||
)
|
||||
)
|
||||
history.append(
|
||||
Message(
|
||||
role="assistant",
|
||||
text=(
|
||||
f"Iteration {i}: detailed plan with dependencies, rollback guidance, and testing details. "
|
||||
"This sentence is intentionally long to create token pressure."
|
||||
),
|
||||
)
|
||||
)
|
||||
return history
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
# 1. Build synthetic history representing long-running in-run growth.
|
||||
messages = _build_long_history()
|
||||
|
||||
# 2. Configure tokenizer and measure token count before compaction.
|
||||
tokenizer = CharacterEstimatorTokenizer()
|
||||
annotate_message_groups(messages, tokenizer=tokenizer)
|
||||
budget_before = included_token_count(messages)
|
||||
|
||||
# 3. Configure composed strategy stack.
|
||||
composed = TokenBudgetComposedStrategy(
|
||||
token_budget=200,
|
||||
tokenizer=tokenizer,
|
||||
strategies=[
|
||||
SelectiveToolCallCompactionStrategy(keep_last_tool_call_groups=0),
|
||||
SummarizationStrategy(
|
||||
client=BudgetSummaryClient(),
|
||||
target_count=3,
|
||||
threshold=3,
|
||||
),
|
||||
SlidingWindowStrategy(keep_last_groups=4),
|
||||
],
|
||||
)
|
||||
|
||||
# 4. Apply compaction and inspect the budget result.
|
||||
projected = await apply_compaction(messages, strategy=composed, tokenizer=tokenizer)
|
||||
budget_after = included_token_count(messages)
|
||||
|
||||
print(f"Projected messages after compaction: {len(projected)}")
|
||||
print(f"Included token count before compaction: {budget_before}")
|
||||
print(f"Included token count after compaction: {budget_after}")
|
||||
print("Projected roles:", [m.role for m in projected])
|
||||
print("Projected messages with token counts:")
|
||||
for msg in projected:
|
||||
group = msg.additional_properties.get("_group")
|
||||
token_count = group.get("token_count") if isinstance(group, dict) else None
|
||||
text_preview = msg.text[:80] if msg.text else "<non-text>"
|
||||
print(f"- [{msg.role}] {text_preview} ({token_count} tokens)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
|
||||
"""
|
||||
Sample output:
|
||||
Projected messages after compaction: 3
|
||||
Included token count before compaction: 793
|
||||
Included token count after compaction: 144
|
||||
Projected roles: ['system', 'user', 'assistant']
|
||||
Projected messages with token counts:
|
||||
- [system] You are a migration copilot. (35 tokens)
|
||||
- [user] Iteration 7: capture migration requirements and edge cases. (43 tokens)
|
||||
- [assistant] Iteration 7: detailed plan with dependencies, rollback guidance, and testing det (66 tokens)
|
||||
"""
|
||||
@@ -0,0 +1,144 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from collections.abc import Awaitable, Mapping, Sequence
|
||||
from typing import Any
|
||||
|
||||
from agent_framework import (
|
||||
GROUP_ANNOTATION_KEY,
|
||||
GROUP_TOKEN_COUNT_KEY,
|
||||
Agent,
|
||||
BaseChatClient,
|
||||
ChatResponse,
|
||||
Message,
|
||||
SlidingWindowStrategy,
|
||||
TruncationStrategy,
|
||||
)
|
||||
|
||||
"""This sample demonstrates client defaults, agent overrides, and run-level overrides for in-run compaction.
|
||||
|
||||
Key components:
|
||||
- A shared client with default `compaction_strategy` and `tokenizer`
|
||||
- An agent-level override that takes precedence over the shared client defaults
|
||||
- A run-level override passed through `agent.run(...)`
|
||||
"""
|
||||
|
||||
|
||||
class FixedTokenizer:
|
||||
"""Simple tokenizer used to make token annotations easy to inspect."""
|
||||
|
||||
def __init__(self, token_count: int) -> None:
|
||||
self._token_count = token_count
|
||||
|
||||
def count_tokens(self, text: str) -> int:
|
||||
return self._token_count
|
||||
|
||||
|
||||
class InspectingChatClient(BaseChatClient[Any]):
|
||||
"""Chat client that records the messages it receives after compaction."""
|
||||
|
||||
def __init__(self, **kwargs: Any) -> None:
|
||||
super().__init__(**kwargs)
|
||||
self.last_messages: list[Message] = []
|
||||
|
||||
def _inner_get_response(
|
||||
self,
|
||||
*,
|
||||
messages: Sequence[Message],
|
||||
stream: bool,
|
||||
options: Mapping[str, Any],
|
||||
**kwargs: Any,
|
||||
) -> Awaitable[ChatResponse]:
|
||||
if stream:
|
||||
raise ValueError("This sample only demonstrates non-streaming responses.")
|
||||
|
||||
self.last_messages = list(messages)
|
||||
|
||||
async def _get_response() -> ChatResponse:
|
||||
return ChatResponse(messages=[Message(role="assistant", text="done")])
|
||||
|
||||
return _get_response()
|
||||
|
||||
|
||||
def _build_messages() -> list[Message]:
|
||||
return [
|
||||
Message(role="user", text="Collect the deployment requirements."),
|
||||
Message(role="assistant", text="I will gather the constraints first."),
|
||||
Message(role="user", text="Summarize the rollout risks."),
|
||||
Message(role="assistant", text="The main risks are drift, downtime, and rollback gaps."),
|
||||
]
|
||||
|
||||
|
||||
def _token_count(message: Message) -> int | None:
|
||||
group_annotation = message.additional_properties.get(GROUP_ANNOTATION_KEY)
|
||||
if not isinstance(group_annotation, dict):
|
||||
return None
|
||||
value = group_annotation.get(GROUP_TOKEN_COUNT_KEY)
|
||||
return value if isinstance(value, int) else None
|
||||
|
||||
|
||||
def _print_model_input(title: str, client: InspectingChatClient) -> None:
|
||||
print(f"\n{title}")
|
||||
print(f"Model receives {len(client.last_messages)} message(s):")
|
||||
for message in client.last_messages:
|
||||
print(f"- [{message.role}] {message.text} ({_token_count(message)} tokens)")
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
# 1. Create one shared client with default compaction settings.
|
||||
shared_client = InspectingChatClient(
|
||||
compaction_strategy=TruncationStrategy(max_n=3, compact_to=2),
|
||||
tokenizer=FixedTokenizer(7),
|
||||
)
|
||||
|
||||
# 2. Create one agent that relies on the client defaults.
|
||||
client_default_agent = Agent(client=shared_client, name="ClientDefaultAgent")
|
||||
|
||||
# 3. Create another agent that overrides the shared client's defaults.
|
||||
agent_override = Agent(
|
||||
client=shared_client,
|
||||
name="AgentOverrideAgent",
|
||||
compaction_strategy=SlidingWindowStrategy(keep_last_groups=3),
|
||||
tokenizer=FixedTokenizer(11),
|
||||
)
|
||||
|
||||
# 4. Run the first agent; the client defaults are applied.
|
||||
await client_default_agent.run(_build_messages())
|
||||
_print_model_input("1. Client default compaction", shared_client)
|
||||
|
||||
# 5. Run the second agent; the agent-level override wins over the client defaults.
|
||||
await agent_override.run(_build_messages())
|
||||
_print_model_input("2. Agent-level override", shared_client)
|
||||
|
||||
# 6. Override both settings for a single run; the per-run values win over both.
|
||||
await agent_override.run(
|
||||
_build_messages(),
|
||||
compaction_strategy=TruncationStrategy(max_n=2, compact_to=1),
|
||||
tokenizer=FixedTokenizer(23),
|
||||
)
|
||||
_print_model_input("3. Per-run override", shared_client)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
|
||||
"""
|
||||
Sample output:
|
||||
|
||||
1. Client default compaction
|
||||
Model receives 2 message(s):
|
||||
- [user] Summarize the rollout risks. (7 tokens)
|
||||
- [assistant] The main risks are drift, downtime, and rollback gaps. (7 tokens)
|
||||
|
||||
2. Agent-level override
|
||||
Model receives 3 message(s):
|
||||
- [assistant] I will gather the constraints first. (11 tokens)
|
||||
- [user] Summarize the rollout risks. (11 tokens)
|
||||
- [assistant] The main risks are drift, downtime, and rollback gaps. (11 tokens)
|
||||
|
||||
3. Per-run override
|
||||
Model receives 1 message(s):
|
||||
- [assistant] The main risks are drift, downtime, and rollback gaps. (23 tokens)
|
||||
"""
|
||||
@@ -0,0 +1,241 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
|
||||
import asyncio
|
||||
from typing import Any
|
||||
|
||||
from agent_framework import (
|
||||
CharacterEstimatorTokenizer,
|
||||
ChatResponse,
|
||||
Content,
|
||||
Message,
|
||||
SelectiveToolCallCompactionStrategy,
|
||||
SlidingWindowStrategy,
|
||||
SummarizationStrategy,
|
||||
TokenBudgetComposedStrategy,
|
||||
ToolResultCompactionStrategy,
|
||||
TruncationStrategy,
|
||||
apply_compaction,
|
||||
)
|
||||
|
||||
"""This sample demonstrates selecting one compaction strategy at a time.
|
||||
|
||||
How to use this sample:
|
||||
- Keep one ``selected_strategy`` block active in ``main``.
|
||||
- Comment the active block and uncomment one of the alternatives to switch strategies.
|
||||
- Run again to compare behavior against the same "before" message list shown once.
|
||||
"""
|
||||
|
||||
SUMMARY_OF_MESSAGE_IDS_KEY = "_summary_of_message_ids"
|
||||
SUMMARIZED_BY_SUMMARY_ID_KEY = "_summarized_by_summary_id"
|
||||
|
||||
# Keep optional strategy classes imported for quick uncomment/switch in main().
|
||||
AVAILABLE_STRATEGY_TYPES = (
|
||||
TruncationStrategy,
|
||||
CharacterEstimatorTokenizer,
|
||||
SlidingWindowStrategy,
|
||||
SelectiveToolCallCompactionStrategy,
|
||||
ToolResultCompactionStrategy,
|
||||
SummarizationStrategy,
|
||||
TokenBudgetComposedStrategy,
|
||||
)
|
||||
|
||||
|
||||
class LocalSummaryClient:
|
||||
"""Simple local summarizer compatible with SupportsChatGetResponse."""
|
||||
|
||||
async def get_response(
|
||||
self,
|
||||
messages: list[Message],
|
||||
*,
|
||||
stream: bool = False,
|
||||
options: dict[str, Any] | None = None,
|
||||
**kwargs: Any,
|
||||
) -> ChatResponse:
|
||||
return ChatResponse(messages=[Message(role="assistant", text=f"Summary for {len(messages)} messages.")])
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
# 1. Build one baseline history and print it once.
|
||||
messages = [
|
||||
Message(role="system", text="You are a helpful assistant."),
|
||||
Message(role="user", text="Plan a data migration."),
|
||||
Message(role="assistant", text="I will gather requirements."),
|
||||
Message(
|
||||
role="assistant",
|
||||
contents=[
|
||||
Content.from_function_call(
|
||||
call_id="call_1",
|
||||
name="list_tables",
|
||||
arguments='{"db":"legacy"}',
|
||||
)
|
||||
],
|
||||
),
|
||||
Message(
|
||||
role="tool",
|
||||
contents=[
|
||||
Content.from_function_result(
|
||||
call_id="call_1",
|
||||
result="users, orders, events",
|
||||
)
|
||||
],
|
||||
),
|
||||
Message(role="assistant", text="I found three core tables."),
|
||||
Message(role="user", text="Estimate effort and risks."),
|
||||
Message(role="assistant", text="Primary risk is schema drift."),
|
||||
]
|
||||
print("\n--- Before compaction ---")
|
||||
print(f"Message count: {len(messages)}")
|
||||
for index, message in enumerate(messages, start=1):
|
||||
message_text = message.text or ", ".join(content.type for content in message.contents)
|
||||
print(f"{index:02d}. [{message.role}] {message_text}")
|
||||
|
||||
# 2. Select exactly one strategy (default shown below).
|
||||
# Truncate when included history exceeds 5 messages, then keep 4.
|
||||
# System remains anchored, so the oldest non-system messages are removed first.
|
||||
# selected_strategy_name = "TruncationStrategy"
|
||||
# selected_strategy = TruncationStrategy(max_n=5, compact_to=4, preserve_system=True)
|
||||
|
||||
# Keep the most recent 4 non-system groups and preserve the system anchor.
|
||||
# A group represents a user turn (and related assistant/tool follow-up).
|
||||
# selected_strategy_name = "SlidingWindowStrategy"
|
||||
# selected_strategy = SlidingWindowStrategy(keep_last_groups=4, preserve_system=True)
|
||||
|
||||
# This means all tool-call groups are removed (assistant function_call message
|
||||
# plus matching tool result messages). In this example, setting to 0 removes
|
||||
# the single assistant+tool pair.
|
||||
selected_strategy_name = "SelectiveToolCallCompactionStrategy"
|
||||
selected_strategy = SelectiveToolCallCompactionStrategy(keep_last_tool_call_groups=0)
|
||||
|
||||
# Collapse older tool-call groups into short "[Tool results: tool_name]" summaries
|
||||
# while keeping the most recent group verbatim. Unlike SelectiveToolCallCompactionStrategy
|
||||
# which fully excludes groups, this preserves a readable trace of tool usage.
|
||||
# selected_strategy_name = "ToolResultCompactionStrategy"
|
||||
# selected_strategy = ToolResultCompactionStrategy(keep_last_tool_call_groups=0)
|
||||
|
||||
# Summarize older messages so only recent context remains, and attach summary
|
||||
# trace metadata linking summary -> originals and originals -> summary.
|
||||
# summary_client = LocalSummaryClient()
|
||||
# selected_strategy_name = "SummarizationStrategy"
|
||||
# selected_strategy = SummarizationStrategy(
|
||||
# client=summary_client, target_count=3, threshold=2
|
||||
# )
|
||||
|
||||
# tokenizer = CharacterEstimatorTokenizer()
|
||||
# selected_strategy_name = "TokenBudgetComposedStrategy"
|
||||
# selected_strategy = TokenBudgetComposedStrategy(
|
||||
# token_budget=150,
|
||||
# tokenizer=tokenizer,
|
||||
# strategies=[
|
||||
# SelectiveToolCallCompactionStrategy(keep_last_tool_call_groups=0),
|
||||
# SlidingWindowStrategy(keep_last_groups=2),
|
||||
# ],
|
||||
# )
|
||||
|
||||
# 3. Apply the selected strategy and print projected output.
|
||||
projected = await apply_compaction(messages, strategy=selected_strategy)
|
||||
print(f"\n--- After compaction ({selected_strategy_name}) ---")
|
||||
print(f"Message count: {len(projected)}")
|
||||
for index, message in enumerate(projected, start=1):
|
||||
message_text = message.text or ", ".join(content.type for content in message.contents)
|
||||
print(f"{index:02d}. [{message.role}] {message_text}")
|
||||
|
||||
summaries = []
|
||||
summarized = []
|
||||
for message in messages:
|
||||
group_annotation = message.additional_properties.get("_group")
|
||||
if not isinstance(group_annotation, dict):
|
||||
continue
|
||||
if group_annotation.get(SUMMARY_OF_MESSAGE_IDS_KEY):
|
||||
summaries.append(message)
|
||||
if group_annotation.get(SUMMARIZED_BY_SUMMARY_ID_KEY):
|
||||
summarized.append(message)
|
||||
if summaries or summarized:
|
||||
print("Summary trace metadata present:")
|
||||
for message in summaries:
|
||||
group_annotation = message.additional_properties.get("_group")
|
||||
summarized_ids = (
|
||||
group_annotation.get(SUMMARY_OF_MESSAGE_IDS_KEY) if isinstance(group_annotation, dict) else None
|
||||
)
|
||||
print(f" summary_id={message.message_id} summarizes={summarized_ids}")
|
||||
for message in summarized:
|
||||
group_annotation = message.additional_properties.get("_group")
|
||||
summarized_by = (
|
||||
group_annotation.get(SUMMARIZED_BY_SUMMARY_ID_KEY) if isinstance(group_annotation, dict) else None
|
||||
)
|
||||
print(f" original_id={message.message_id} summarized_by={summarized_by}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
|
||||
"""
|
||||
Sample output (always present):
|
||||
--- Before compaction ---
|
||||
Message count: 8
|
||||
01. [system] You are a helpful assistant.
|
||||
02. [user] Plan a data migration.
|
||||
03. [assistant] I will gather requirements.
|
||||
04. [assistant] function_call
|
||||
05. [tool] function_result
|
||||
06. [assistant] I found three core tables.
|
||||
07. [user] Estimate effort and risks.
|
||||
08. [assistant] Primary risk is schema drift.
|
||||
"""
|
||||
|
||||
"""
|
||||
Sample output (varies based on selected strategy):
|
||||
--- After compaction (TruncationStrategy) ---
|
||||
Message count: 4
|
||||
01. [system] You are a helpful assistant.
|
||||
02. [assistant] I found three core tables.
|
||||
03. [user] Estimate effort and risks.
|
||||
04. [assistant] Primary risk is schema drift.
|
||||
|
||||
--- After compaction (SlidingWindowStrategy) ---
|
||||
Message count: 6
|
||||
01. [system] You are a helpful assistant.
|
||||
02. [assistant] function_call
|
||||
03. [tool] function_result
|
||||
04. [assistant] I found three core tables.
|
||||
05. [user] Estimate effort and risks.
|
||||
06. [assistant] Primary risk is schema drift.
|
||||
|
||||
--- After compaction (SelectiveToolCallCompactionStrategy) ---
|
||||
Message count: 6
|
||||
01. [system] You are a helpful assistant.
|
||||
02. [user] Plan a data migration.
|
||||
03. [assistant] I will gather requirements.
|
||||
04. [assistant] I found three core tables.
|
||||
05. [user] Estimate effort and risks.
|
||||
06. [assistant] Primary risk is schema drift.
|
||||
|
||||
--- After compaction (ToolResultCompactionStrategy) ---
|
||||
Message count: 7
|
||||
01. [system] You are a helpful assistant.
|
||||
02. [assistant] [Tool results: list_tables]
|
||||
03. [user] Plan a data migration.
|
||||
04. [assistant] I will gather requirements.
|
||||
05. [assistant] I found three core tables.
|
||||
06. [user] Estimate effort and risks.
|
||||
07. [assistant] Primary risk is schema drift.
|
||||
|
||||
--- After compaction (SummarizationStrategy) ---
|
||||
Message count: 5
|
||||
01. [system] You are a helpful assistant.
|
||||
02. [assistant] Summary for 2 messages.
|
||||
03. [assistant] I found three core tables.
|
||||
04. [user] Estimate effort and risks.
|
||||
05. [assistant] Primary risk is schema drift.
|
||||
Summary trace metadata present:
|
||||
summary_id=summary_8 summarizes=['msg_1', 'msg_2', 'msg_3', 'msg_4']
|
||||
original_id=msg_1 summarized_by=summary_8
|
||||
original_id=msg_2 summarized_by=summary_8
|
||||
original_id=msg_3 summarized_by=summary_8
|
||||
original_id=msg_4 summarized_by=summary_8
|
||||
|
||||
--- After compaction (TokenBudgetComposedStrategy) ---
|
||||
Message count: 3
|
||||
01. [system] You are a helpful assistant.
|
||||
02. [user] Estimate effort and risks.
|
||||
03. [assistant] Primary risk is schema drift.
|
||||
"""
|
||||
@@ -0,0 +1,249 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
|
||||
import asyncio
|
||||
from collections.abc import Sequence
|
||||
from typing import Any
|
||||
|
||||
from agent_framework import (
|
||||
Agent,
|
||||
ChatContext,
|
||||
CompactionProvider,
|
||||
InMemoryHistoryProvider,
|
||||
Message,
|
||||
SlidingWindowStrategy,
|
||||
ToolResultCompactionStrategy,
|
||||
chat_middleware,
|
||||
tool,
|
||||
)
|
||||
from agent_framework.openai import OpenAIChatClient
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
"""
|
||||
CompactionProvider with Agent Example
|
||||
|
||||
Demonstrates ``CompactionProvider`` as part of a real agent's context-provider
|
||||
pipeline alongside ``InMemoryHistoryProvider``.
|
||||
|
||||
The compaction provider uses two separate strategies:
|
||||
|
||||
- ``before_strategy``: Applied to the loaded history before the model sees it.
|
||||
Here a ``SlidingWindowStrategy`` keeps only the last 3 message groups, so
|
||||
older turns get dropped as the conversation grows.
|
||||
- ``after_strategy``: Applied to the stored history after each turn.
|
||||
Here a ``ToolResultCompactionStrategy`` collapses all but the most recent
|
||||
tool-call group into short ``[Tool results: ...]`` summaries.
|
||||
|
||||
A chat middleware logs the messages the model actually receives (after context
|
||||
providers and compaction have run) so you can see the effect of compaction.
|
||||
|
||||
This sample intentionally is too aggressive in excluding content, because you can see
|
||||
that the last turn actually does not have the full context any longer and is therefore
|
||||
only comparing the results from Paris and Tokyo and not from London.
|
||||
|
||||
Run with:
|
||||
uv run samples/02-agents/compaction/compaction_provider.py
|
||||
"""
|
||||
|
||||
|
||||
@tool(approval_mode="never_require")
|
||||
def get_weather(city: str) -> str:
|
||||
"""Get the current weather for a city."""
|
||||
weather_data = {
|
||||
"London": "cloudy, 12°C",
|
||||
"Paris": "sunny, 18°C",
|
||||
"Tokyo": "rainy, 22°C",
|
||||
}
|
||||
return weather_data.get(city, f"No data for {city}")
|
||||
|
||||
|
||||
@chat_middleware
|
||||
async def log_model_input(context: ChatContext, call_next: Any) -> None:
|
||||
"""Chat middleware that logs the messages sent to the model (after compaction)."""
|
||||
msgs: Sequence[Message] = context.messages
|
||||
print(f"\n Model receives {len(msgs)} messages:")
|
||||
for i, m in enumerate(msgs, 1):
|
||||
text = m.text or ", ".join(c.type for c in m.contents)
|
||||
print(f" {i:02d}. [{m.role}] {text[:70]}")
|
||||
await call_next()
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
client = OpenAIChatClient(model_id="gpt-4o-mini")
|
||||
|
||||
# History provider loads/stores conversation messages in session.state.
|
||||
# skip_excluded=True means get_messages() will omit messages that were
|
||||
# marked as excluded by the CompactionProvider's after_strategy.
|
||||
history = InMemoryHistoryProvider(skip_excluded=True)
|
||||
|
||||
compaction = CompactionProvider(
|
||||
# BEFORE each turn: SlidingWindow drops older message groups from
|
||||
# the loaded context so the model's input stays bounded. With
|
||||
# keep_last_groups=3, only the 3 most recent non-system groups are
|
||||
# sent to the model — older turns are not shown to the model.
|
||||
before_strategy=SlidingWindowStrategy(keep_last_groups=3, preserve_system=True),
|
||||
# AFTER each turn: ToolResultCompaction marks older tool-call groups
|
||||
# (assistant function_call + tool result messages) as excluded and
|
||||
# inserts a short "[Tool results: ...]" summary. The original messages
|
||||
# stay in storage with _excluded=True; skip_excluded on the history
|
||||
# provider ensures they won't be loaded on the next turn.
|
||||
after_strategy=ToolResultCompactionStrategy(keep_last_tool_call_groups=1),
|
||||
history_source_id=history.source_id,
|
||||
)
|
||||
|
||||
# Provider order matters:
|
||||
# before_run: history loads → compaction trims (forward order)
|
||||
# after_run: compaction marks exclusions → history stores (reverse order)
|
||||
agent = Agent(
|
||||
client=client,
|
||||
name="WeatherAssistant",
|
||||
instructions="You are a helpful weather assistant. Use the get_weather tool when asked about weather.",
|
||||
tools=[get_weather],
|
||||
context_providers=[history, compaction],
|
||||
middleware=[log_model_input],
|
||||
)
|
||||
|
||||
session = agent.create_session()
|
||||
|
||||
queries = [
|
||||
"What is the weather in London?",
|
||||
"How about Paris?",
|
||||
"And Tokyo?",
|
||||
"Which city is the warmest?",
|
||||
]
|
||||
|
||||
for turn, query in enumerate(queries, 1):
|
||||
print(f"\n{'=' * 60}")
|
||||
print(f"Turn {turn} — User: {query}")
|
||||
|
||||
# ── What is in the persistent store right now? ──
|
||||
# This shows ALL messages the history provider has accumulated,
|
||||
# including any that were marked as excluded by the after_strategy
|
||||
# on the previous turn. Messages marked ✗ are excluded and won't
|
||||
# be loaded because skip_excluded=True on the history provider.
|
||||
stored = session.state.get(history.source_id, {}).get("messages", [])
|
||||
if stored:
|
||||
excluded_count = sum(1 for m in stored if m.additional_properties.get("_excluded", False))
|
||||
print(f"\n Stored history: {len(stored)} messages ({excluded_count} excluded)")
|
||||
for i, m in enumerate(stored, 1):
|
||||
text = m.text or ", ".join(c.type for c in m.contents)
|
||||
excluded = m.additional_properties.get("_excluded", False)
|
||||
reason = m.additional_properties.get("_exclude_reason", "")
|
||||
if excluded:
|
||||
marker = f" ✗ ({reason})"
|
||||
elif (m.text or "").startswith("[Tool results:"):
|
||||
marker = " ← summary"
|
||||
else:
|
||||
marker = ""
|
||||
print(f" {i:02d}. [{m.role}]{marker} {text[:65]}")
|
||||
|
||||
# ── What the model actually sees ──
|
||||
# The chat middleware fires AFTER the full context pipeline:
|
||||
# 1. InMemoryHistoryProvider loads non-excluded stored messages
|
||||
# 2. CompactionProvider.before_strategy (SlidingWindow) drops
|
||||
# older groups so only the last 3 non-system groups survive
|
||||
# 3. The agent prepends instructions and appends the new user input
|
||||
# So this list is shorter than what's in storage.
|
||||
result = await agent.run(query, session=session)
|
||||
|
||||
# ── What happens after the turn ──
|
||||
# The agent's after_run pipeline runs in reverse provider order:
|
||||
# 1. CompactionProvider.after_strategy (ToolResultCompaction) marks
|
||||
# older tool-call groups as excluded in the stored messages —
|
||||
# their assistant+tool messages get ✗ and a summary is inserted
|
||||
# 2. InMemoryHistoryProvider appends the new input + response
|
||||
# On the NEXT turn, skip_excluded=True means the ✗ messages won't load.
|
||||
print(f"\n Agent: {result.text}")
|
||||
|
||||
print(f"\n{'=' * 60}")
|
||||
print("Done.")
|
||||
|
||||
|
||||
"""
|
||||
Example output:
|
||||
============================================================
|
||||
Turn 1 — User: What is the weather in London?
|
||||
|
||||
Model receives 1 messages:
|
||||
01. [user] What is the weather in London?
|
||||
|
||||
Agent: The weather in London is cloudy with a temperature of 12°C.
|
||||
|
||||
============================================================
|
||||
Turn 2 — User: How about Paris?
|
||||
|
||||
Stored history: 4 messages (0 excluded)
|
||||
01. [user] What is the weather in London?
|
||||
02. [assistant] function_call
|
||||
03. [tool] function_result
|
||||
04. [assistant] The weather in London is cloudy with a temperature of 12°C.
|
||||
|
||||
Model receives 5 messages:
|
||||
01. [user] What is the weather in London?
|
||||
02. [assistant] function_call
|
||||
03. [tool] function_result
|
||||
04. [assistant] The weather in London is cloudy with a temperature of 12°C.
|
||||
05. [user] How about Paris?
|
||||
|
||||
Agent: The weather in Paris is sunny with a temperature of 18°C.
|
||||
|
||||
============================================================
|
||||
Turn 3 — User: And Tokyo?
|
||||
|
||||
Stored history: 8 messages (0 excluded)
|
||||
01. [user] What is the weather in London?
|
||||
02. [assistant] function_call
|
||||
03. [tool] function_result
|
||||
04. [assistant] The weather in London is cloudy with a temperature of 12°C.
|
||||
05. [user] How about Paris?
|
||||
06. [assistant] function_call
|
||||
07. [tool] function_result
|
||||
08. [assistant] The weather in Paris is sunny with a temperature of 18°C.
|
||||
|
||||
Model receives 5 messages:
|
||||
01. [assistant] The weather in London is cloudy with a temperature of 12°C.
|
||||
02. [assistant] function_call
|
||||
03. [tool] function_result
|
||||
04. [assistant] The weather in Paris is sunny with a temperature of 18°C.
|
||||
05. [user] And Tokyo?
|
||||
|
||||
Agent: The weather in Tokyo is rainy with a temperature of 22°C.
|
||||
|
||||
============================================================
|
||||
Turn 4 — User: Which city is the warmest?
|
||||
|
||||
Stored history: 13 messages (3 excluded)
|
||||
01. [user] What is the weather in London?
|
||||
02. [assistant] ← summary [Tool results: get_weather: cloudy, 12°C]
|
||||
03. [assistant] ✗ (tool_result_compaction) function_call
|
||||
04. [tool] ✗ (tool_result_compaction) function_result
|
||||
05. [assistant] The weather in London is cloudy with a temperature of 12°C.
|
||||
06. [user] ✗ (tool_result_compaction) How about Paris?
|
||||
07. [assistant] function_call
|
||||
08. [tool] function_result
|
||||
09. [assistant] The weather in Paris is sunny with a temperature of 18°C.
|
||||
10. [user] And Tokyo?
|
||||
11. [assistant] function_call
|
||||
12. [tool] function_result
|
||||
13. [assistant] The weather in Tokyo is rainy with a temperature of 22°C.
|
||||
|
||||
Model receives 8 messages:
|
||||
01. [assistant] function_call
|
||||
02. [tool] function_result
|
||||
03. [assistant] The weather in Paris is sunny with a temperature of 18°C.
|
||||
04. [user] And Tokyo?
|
||||
05. [assistant] function_call
|
||||
06. [tool] function_result
|
||||
07. [assistant] The weather in Tokyo is rainy with a temperature of 22°C.
|
||||
08. [user] Which city is the warmest?
|
||||
|
||||
Agent: Tokyo is the warmest city with a temperature of 22°C, compared to Paris, which is at 18°C.
|
||||
|
||||
============================================================
|
||||
Done.
|
||||
"""
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -0,0 +1,89 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
|
||||
import asyncio
|
||||
|
||||
from agent_framework import (
|
||||
Message,
|
||||
annotate_message_groups,
|
||||
apply_compaction,
|
||||
included_messages,
|
||||
)
|
||||
|
||||
"""This sample demonstrates authoring a custom compaction strategy.
|
||||
|
||||
The custom strategy keeps system messages and the most recent user turn while
|
||||
excluding older non-system groups.
|
||||
"""
|
||||
|
||||
EXCLUDED_KEY = "_excluded"
|
||||
GROUP_ANNOTATION_KEY = "_group"
|
||||
|
||||
|
||||
class KeepLastUserTurnStrategy:
|
||||
async def __call__(self, messages: list[Message]) -> bool:
|
||||
group_ids = annotate_message_groups(messages)
|
||||
group_kinds: dict[str, str] = {}
|
||||
for message in messages:
|
||||
group_annotation = message.additional_properties.get(GROUP_ANNOTATION_KEY)
|
||||
group_id = group_annotation.get("id") if isinstance(group_annotation, dict) else None
|
||||
kind = group_annotation.get("kind") if isinstance(group_annotation, dict) else None
|
||||
if (
|
||||
isinstance(group_id, str)
|
||||
and isinstance(kind, str)
|
||||
and group_id not in group_kinds
|
||||
):
|
||||
group_kinds[group_id] = kind
|
||||
user_group_ids = [
|
||||
group_id for group_id in group_ids if group_kinds.get(group_id) == "user"
|
||||
]
|
||||
if not user_group_ids:
|
||||
return False
|
||||
keep_user_group_id = user_group_ids[-1]
|
||||
|
||||
changed = False
|
||||
for message in messages:
|
||||
group_annotation = message.additional_properties.get(GROUP_ANNOTATION_KEY)
|
||||
group_id = group_annotation.get("id") if isinstance(group_annotation, dict) else None
|
||||
if message.role == "system":
|
||||
continue
|
||||
if group_id == keep_user_group_id:
|
||||
continue
|
||||
if message.additional_properties.get(EXCLUDED_KEY) is not True:
|
||||
changed = True
|
||||
message.additional_properties[EXCLUDED_KEY] = True
|
||||
return changed
|
||||
|
||||
|
||||
def _messages() -> list[Message]:
|
||||
return [
|
||||
Message(role="system", text="You are concise."),
|
||||
Message(role="user", text="first request"),
|
||||
Message(role="assistant", text="first response"),
|
||||
Message(role="user", text="second request"),
|
||||
Message(role="assistant", text="second response"),
|
||||
]
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
# 1. Build a short conversation.
|
||||
messages = _messages()
|
||||
print(f"Number of messages before compaction: {len(messages)}")
|
||||
# 2. Apply custom strategy.
|
||||
await apply_compaction(messages, strategy=KeepLastUserTurnStrategy())
|
||||
# 3. Print projected messages.
|
||||
projected = included_messages(messages)
|
||||
print(f"Number of messages after compaction: {len(projected)}")
|
||||
for msg in projected:
|
||||
print(f"[{msg.role}] {msg.text}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
|
||||
"""
|
||||
Sample output:
|
||||
Number of messages before compaction: 5
|
||||
Number of messages after compaction: 2
|
||||
[system] You are concise.
|
||||
[user] second request
|
||||
"""
|
||||
@@ -0,0 +1,124 @@
|
||||
# /// script
|
||||
# requires-python = ">=3.10"
|
||||
# dependencies = [
|
||||
# "tiktoken",
|
||||
# ]
|
||||
# ///
|
||||
# Run with: uv run samples/02-agents/compaction/tiktoken_tokenizer.py
|
||||
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
|
||||
import asyncio
|
||||
from typing import Any
|
||||
|
||||
import tiktoken
|
||||
from agent_framework import (
|
||||
Message,
|
||||
TokenizerProtocol,
|
||||
TruncationStrategy,
|
||||
annotate_message_groups,
|
||||
apply_compaction,
|
||||
included_token_count,
|
||||
)
|
||||
|
||||
"""This sample demonstrates a custom TokenizerProtocol implementation with tiktoken.
|
||||
|
||||
Key components:
|
||||
- `TiktokenTokenizer` backed by `tiktoken`
|
||||
- Token-based `TruncationStrategy` (`max_n` / `compact_to`)
|
||||
- Inspecting projected roles and remaining included token count
|
||||
"""
|
||||
|
||||
|
||||
class TiktokenTokenizer(TokenizerProtocol):
|
||||
"""TokenizerProtocol implementation backed by tiktoken's o200k_base (gpt-4.1 and up default) encoding."""
|
||||
|
||||
def __init__(
|
||||
self, *, encoding_name: str = "o200k_base", model_name: str | None = None
|
||||
) -> None:
|
||||
if model_name is not None:
|
||||
self._encoding = tiktoken.encoding_for_model(model_name)
|
||||
else:
|
||||
self._encoding: Any = tiktoken.get_encoding(encoding_name)
|
||||
|
||||
def count_tokens(self, text: str) -> int:
|
||||
return len(self._encoding.encode(text))
|
||||
|
||||
|
||||
def _build_messages() -> list[Message]:
|
||||
return [
|
||||
Message(role="system", text="You are a migration assistant."),
|
||||
Message(
|
||||
role="user",
|
||||
text="List all migration risks and include detailed mitigations for each risk category.",
|
||||
),
|
||||
Message(
|
||||
role="assistant",
|
||||
text=(
|
||||
"Primary risks include schema drift, missing foreign key constraints, "
|
||||
"and data quality regressions. Mitigations include staged validation, "
|
||||
"shadow writes, and replay-based verification."
|
||||
),
|
||||
),
|
||||
Message(
|
||||
role="user",
|
||||
text=(
|
||||
"Now provide a detailed checklist with owners, rollback "
|
||||
"gates, and validation criteria."
|
||||
),
|
||||
),
|
||||
Message(
|
||||
role="assistant",
|
||||
text=(
|
||||
"Checklist: baseline snapshots, migration dry-run, production "
|
||||
"canary, progressive deployment, automated integrity checks, and "
|
||||
"post-migration reconciliation."
|
||||
),
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
# 1. Create a tokenizer implementation that uses tiktoken.
|
||||
tokenizer = TiktokenTokenizer()
|
||||
|
||||
# 2. Configure token-based truncation.
|
||||
strategy = TruncationStrategy(
|
||||
max_n=250,
|
||||
compact_to=150,
|
||||
tokenizer=tokenizer,
|
||||
preserve_system=True,
|
||||
)
|
||||
|
||||
# 3. Build conversation and measure token count before compaction.
|
||||
messages = _build_messages()
|
||||
annotate_message_groups(messages, tokenizer=tokenizer)
|
||||
token_count_before = included_token_count(messages)
|
||||
|
||||
# 4. Apply compaction and measure token count after compaction.
|
||||
projected = await apply_compaction(messages, strategy=strategy, tokenizer=tokenizer)
|
||||
token_count_after = included_token_count(messages)
|
||||
|
||||
# 5. Print before/after token counts and projected conversation.
|
||||
print(f"Projected messages: {len(projected)}")
|
||||
print(f"Included token count before compaction: {token_count_before}")
|
||||
print(f"Included token count after compaction: {token_count_after}")
|
||||
print("Projected roles:", [message.role for message in projected])
|
||||
for message in projected:
|
||||
token_count = message.additional_properties.get("_group", {}).get("token_count")
|
||||
print(f"- [{message.role}] {message.text} ({token_count} tokens)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
|
||||
"""
|
||||
Projected messages: 3
|
||||
Included token count before compaction: 263
|
||||
Included token count after compaction: 149
|
||||
Projected roles: ['system', 'user', 'assistant']
|
||||
- [system] You are a migration assistant. (40 tokens)
|
||||
- [user] Now provide a detailed checklist with owners, rollback gates, and validation criteria. (49 tokens)
|
||||
- [assistant] Checklist: baseline snapshots, migration dry-run, production canary,
|
||||
progressive deployment, automated integrity checks, and post-migration reconciliation. (60 tokens)
|
||||
"""
|
||||
Reference in New Issue
Block a user