Design doc draft (#5)

* wip

* wip

* wip

* wip

* wip

* wip

* Update docs/design/main.md

Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>

* Update docs/design/main.md

Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* update

* update

* update

* wip

* wip

* wip

* wip

* address comment

* update

* add custom agent example

* address comment

* update code teaser

* Update docs/design/main.md

Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>

* update

* address comments

* update guardrails

* address some of mark's comments

* add new separate sections for agents and workflows

* update agent doc

* Update agent.md

Co-authored-by: Jack Gerrits <jackgerrits@users.noreply.github.com>

* add foundry agent doc

* wip

* refine the component registration interface with agent runtime

* update

* workflows

* update

* update

* Update

* Update

* update

* Update design doc to remove runtime

* Update

* Update

* Update

* update

* Add eval section notes (#9)

* add notes on eval

* remove duplicate title

* update docs

* update docs

* save updates before merge

* update evaluation script

* Update agents.md

* update workflows

* Update

Co-authored-by: Jack Gerrits <jackgerrits@users.noreply.github.com>

* update workflow

* Updated design doc

* Update

* Update

* update

* update

* Update

* update

* update

* Update

* update

* Update with agent abstraction alternatives

* Update discussion

* Update

* update

* Update

* Update

* Update

* Update

---------

Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
Co-authored-by: Jack Gerrits <jackgerrits@users.noreply.github.com>
Co-authored-by: Victor Dibia <chuvidi2003@gmail.com>
This commit is contained in:
Eric Zhu
2025-05-29 12:36:54 -07:00
committed by GitHub
Unverified
parent d0531bb93b
commit 6089446f04
14 changed files with 1528 additions and 0 deletions
+103
View File
@@ -0,0 +1,103 @@
# Agent Framework Design Doc
What values does the framework provide?
- A set of configurable, extensible and high-quality components (e.g., model clients, tools, MCP servers and memory).
- An easy path for deploying, securing and scaling applications, both locally and in the cloud.
- Integration with tools for monitoring, debugging, evaluation and optimization, both locally and in the cloud.
- A community of developers and users for support, ideas, and contributions, benefiting everyone in the ecosystem.
What is this document?
- An overview of the new framework.
- Defining the major elements of the framework and their relationships.
- Detailed design of each element and its implementation will be in a separate document.
## Core Data Types
To unify the interaction between components, we define a set of core
data types that are used throughout the framework.
See [Core Data Types](types.md) for more details.
## Components
A component is a class that provides a specific functionality and can be used
independently by applications.
There are two types of components in the framework: agent components and agents. Agent components are the building blocks of agents, while agents are
the higher-level components, and can be composed from agent components
and other agents (as in workflows).
The framework defines the following components. Follow the links to
find the design details of each component:
- Agent Components:
- [Model Client](models.md)
- [Vector Store and Embedding Client](vector-stores.md)
- [Tool](tools.md)
- [MCP Server](mcp-servers.md)
- [Memory](memory.md)
- [Thread](threads.md)
- [Guardrail](guardrails.md)
- Agent and Workflow:
- [Agent](agents.md)
- [Workflow](workflows.md)
### Composition
Components can be composed to create complex components. For example,
an agent can be composed from model clients, tools and memory,
and a tool can be composed from an agent or a workflow.
It is the responsibility of the framework to validate components
and their composition.
### Configuration
A component can be created from a set of serializable configuration parameters,
with the help of dependency injection to resolve non-serializable dependencies.
### Relationships
The following diagram shows the component relationship of the framework:
```mermaid
graph TD
Component[Component] --> |extends| Agent[Agent]
Agent --> |extends| Workflow[Workflow]
Component --> |extends| ModelClient[Model Client]
Component --> |extends| VectorStore[Vector Store]
Component --> |extends| EmbeddingClient[Embedding Client]
Component --> |extends| Tool[Tool]
Component --> |extends| MCPServer[MCP Server]
Component --> |extends| Memory[Memory]
Component --> |extends| Thread[Thread]
Component --> |extends| Guardrail[Guardrail]
Agent --> |uses| uses1[Model Client]
Agent --> |uses| uses2[Thread]
Agent --> |uses| uses3[Tools and MCP Servers]
Agent --> |uses| uses4[Memory]
Agent --> |uses| uses5[Guardrail]
Workflow --> |contains| contains[Child Agents]
Memory --> |uses| uses5[Vector Store]
VectorStore --> |uses| uses6[Embedding Client]
```
## Deployment and Scaling
[Deployment](deployment.md).
## Observability and Monitoring
[Observability](observability.md).
## Evaluation
[Evaluation](evaluation.md).
## Optimization
[Optimization](optimization.md).
+476
View File
@@ -0,0 +1,476 @@
# Agents
An agent is a component that processes messages in a thread and returns a result.
During its handling of messages, an agent may:
- Use model client to process messages,
- Use thread to keep track of the interaction with the model,
- Invoke tools or MCP servers, and
- Retrieve and store data through memory.
It is up to the implementation of the agent class to decide how these components are used.
__An important design goal of the framework is to ensure the developer experience
of creating custom agent is as easy as possible.__ Existing frameworks
have made "kitchen-sink" agents that are hard to understand and maintain.
An agent might not use the components provided by the framework to implement
the agent interface.
Azure AI Agent is an example of such agent: its implementation is
backed by the Azure AI Agent Service.
The framework provides a set of pre-built agents:
- `ChatCompletionAgent`: an agent that uses a chat-completion model to process messages
and use thread, memory, tools and MCP servers in a configurable way. __If we can make
custom agents easy to implement, we can remove this agent.__
- `AzureAIAgent`: an agent that is backed by Azure AI Agent Service.
- `ResponsesAgent`: an agent that is backed by OpenAI's Responses API.
- `A2AAgent`: an agent that is backed by the [A2A Protocol](https://google.github.io/A2A/documentation/).
## `Agent` protocol
```python
class Agent(Protocol):
"""The protocol for all agents in the framework."""
async def run(
self,
thread: Thread,
context: Context,
) -> Result:
"""The method to run the agent on a thread of messages, and return the result.
Args:
thread: The thread of messages to process: it may be a local thread
or a stub thread that is backed by a remote service.
context: The context for the current invocation of the agent, providing
access to the event channel, and human-in-the-loop (HITL) features.
Returns:
The result of running the agent, which includes the final response.
"""
...
@dataclass
class Context:
"""The context for the current invocation of the agent."""
event_handler: EventHandler
"""The event consumer for handling events emitted by the agent."""
user_input_source: UserInputSource
"""The user input source for requesting for user input during the agent run."""
... # Other fields, could be extended to include more for application-specific needs.
@dataclass
class Result:
"""The result of running an agent."""
final_response: Message
... # Other fields, could be extended to include more for application-specific needs.
```
## `ToolCallingAgent` example
Here is an example of a custom agent that calls a tool and returns the result.
The `ToolCallingAgent` implements the `Agent` base class and
it implements the `run` method to process incoming messages and call tools if needed.
```python
class ToolCallingAgent(Agent):
def __init__(
self,
model_client: ModelClient,
tools: list[Tool],
) -> None:
self.model_client = model_client
self.tools = tools
async def run(self, thread: Thread, context: Context) -> Result:
# Create a response using the model client, passing the thread and context.
create_result = await self.model_client.create(thread, context, tools=self.tools)
# Emit the event to notify the workflow consumer of a model response.
await context.emit(ModelResponseEvent(create_result))
if create_result.is_tool_call():
# Get user approval for the tool call through the context.
approval = await context.get_user_approval(create_result.tool_calls)
if not approval:
# ... return a canned response.
# Call the tools with the tool calls in the response.
tools = ... # Find the tool by name in the tools list.
tool_results = ... # Call the tool with the tool call arguments.
# Emit the event to notify the workflow consumer of a tool call.
await context.emit(ToolCallEvent(tool_result))
# Update the thread with the tool result.
await thread.append(tool_result.to_messages())
# Return the tool result as the response.
return Result(
final_response=tool_result,
)
else:
# Return the response as the result.
return Result(
final_response=create_result,
)
```
Things to note in the implementation of the `run` method:
- Orchestration of tools and model is completly customizable.
- Components such as `thread` and `model_client` interacts smoothly with little boilerplate code.
- The `context` parameter provides convenient access to the workflow run fixtures such as event channel.
An agent doesn't need to use components provided by the framework to implement the agent interface.
For example, in a multi-agent workflow, we may need a verification agent in a using deterministic
logic to critic another agent's response.
```python
class CriticAgent(Agent):
def __init__(self) -> None:
self.verification_logic = ... # Some verification logic, e.g. a set of rules.
async def run(self, thread: Thread, context: Context) -> Result:
# Use the verification logic to verify the last message in the thread.
response = thread.get_last_message()
is_verified = self.verification_logic.verify(response)
if is_verified:
final_response = Message("The response is verified.")
else:
final_response = Message("The response is not verified.")
return Result(
final_response=final_response,
)
```
## Run
A _run_ is a single invocation of the agent or a workflow given a thread of messages.
## Run agent
Developer can instantiate a subclass of `Agent` directly using it's constructor,
and run it by calling the `run` method.
```python
@FunctionTool
def my_tool(input: str) -> str:
return f"Tool result for {input}"
model_client = OpenAIChatCompletionClient("gpt-4.1")
agent = ToolCallingAgent(
model_client=model_client,
tools=[my_tool],
)
# Create a thread for the current task.
thread = [
Message("Hello"),
Message("Can you find the file 'foo.txt' for me?"),
]
# Create a context that uses a handler that prints emitted events to the console,
# and a user input source that reads from the console.
context = Context(event_handler=ConsoleEventHandler(), user_input_source=ConsoleUserInputSource())
# Run the agent with the thread and context.
result = await agent.run(thread, context)
```
## User session
A user session is a logical concept which involves a sequence of messages exchanged between the user and the agent.
Consider the following examples:
- A chat session in ChatGPT.
- A delegation of task to a workflow agent from a user, with data exchanged between the user
and the workflow such as occassional feedbacks from the user and status updates from the workflow.
A user session may involve multiple runs.
## User session state
Rather than classifying agents as stateless or stateful, we focus on how state is managed during a user session.
There are several states that an application may maintain during a user session:
- **Conversation or workflow state**. This is the conversation history or execution
history in a workflow. This state is typically owned and managed by the thread object.
- **Long-term memory**. This can be information relevant to the user,
such as user preferences, past interactions, or other relevant data.
This can also be information relevant to the task, such as past trajectories,
past results, or other task-related data. These states are typically
owned and managed by a memory object.
The thread is always passed through the agent's `run` method.
Whereas the memory is can be set through the constructor of the agent.
See the [Memory](memory.md) design document for more details on how memory
is used in the framework.
It is up to the application to decide whether to reuse state across different
user sessions. The framework should provide the necessary methods and storage layer integration
for persisting and retrieving state, but the application should decide how to use them.
## Run agent concurrently
If the agent just call models and tools that are stateless,
we can run the same instance of the agent concurrently.
```python
# Create threads for concurrent tasks.
thread1 = [
Message("Hello"),
Message("Can you find the file 'foo.txt' for me?"),
]
thread2 = [
Message("Hello"),
Message("Can you find the file 'bar.txt' for me?"),
]
# Run the agent concurrently on multiple threads.
results = await asyncio.gather(
agent.run(thread1, context),
agent.run(thread2, context),
)
# The `context`'s event handlers will emit events from both runs.
```
This is not always the right way to run concurrent agents, as some tools
or memory associated with the agent may not be concurrent-safe.
It is up the application to decide if an agent can run concurrently,
or multiple instances should be created for each thread.
## Using Foundry Agent Service
The framework offers a built-in agent class for users of the Foundry Agent Service.
The agent class essentially acts as a proxy to the agent hosted by the Foundry Agent Service.
```python
agent = FoundryAgent(
name="my_foundry_agent",
project_client="ProjectClient",
agent_id="my_agent_id", # If not provided, a new agent will be created.
deployment_name="my_deployment",
instruction="my_instruction",
... # Other parameters for the agent creation.
)
# Create a thread that is backed by the Foundry Agent Service.
thread = FoundryThread(thread_id="my_thread_id")
# Run the agent on the thread and an new context that emits events to the console.
result = await agent.run(thread, RunContext(event_channel="console"))
```
## Alternative agent abstractions
There are two alternatives:
1. **Agent with private conversation state**: The agent manages its own conversation state,
either by using a thread or other custom logics. The conversation state is
not shared with other agents or workflows. It is up to the agent to decide how
to manage the conversation state.
2. **Agent without conversation state**: The conversation state is externalized
and managed by a thread abstraction. The agent is invoked with a thread on
every run. While it can still use the thread to append messages etc., it loses
control over the conversation state the moment the run method returns.
### Protocol comparison
For agent with private conversation state, agent is invoked with new messages
and the agent is responsible for managing the conversation state while exposing
public methods for the orchestration code to manipulate its conversation state
indirectly.
```python
class Agent(Protocol):
async def run(
self,
messages: list[Message],
context: Context,
) -> Result:
"""The method to run the agent and return the result.
Args:
messages: The list of new messages to process.
context: The context for the current invocation of the agent, providing
access to the event channel, and human-in-the-loop (HITL) features.
Returns:
The result of running the agent, which includes the final response.
"""
...
async def reset() -> None:
"""Reset the conversation state of the agent."""
...
# And other methods for managing the conversation state.
```
For agent without conversation state, the agent is invoked with a thread
and the agent is responsible for processing the messages in the thread.
```python
class Agent(Protocol):
async def run(
self,
thread: Thread,
context: Context,
) -> Result:
"""The method to run the agent on a thread of messages, and return the result.
Args:
thread: The current conversation state.
context: The context for the current invocation of the agent, providing
access to the event channel, and human-in-the-loop (HITL) features.
Returns:
The result of running the agent, which includes the final response.
"""
...
```
### Constructor comparison
For agent with private conversation state, the agent is initialized with
the a state in addition to components like model client and tools, which could be a thread passed to the constructor,
or a custom state object that the agent uses to manage its conversation state.
```python
class CustomAgent(Agent):
def __init__(self,
model_client: ModelClient,
tools: list[Tool],
state: CustomState, # Could be a thread or a custom state object, or nothing at all.
) -> None:
self.model_client = model_client
self.tools = tools
self.state = state # Could be created by the agent within the constructor.
```
For agent without conversation state, the agent is initialized with
the components it needs to process messages, such as a model client and tools.
```python
class CustomAgent(Agent):
def __init__(
self,
model_client: ModelClient,
tools: list[Tool],
) -> None:
self.model_client = model_client
self.tools = tools
```
### Thread-Agent compatibility considerations
For agent with private conversation state, compatibility with thread is not a concern,
as this is completely managed by the agent itself.
For agent without conversation state, the thread must be compatible with the agent's
`run` method. For example, a `FoundryAgent` must work with a `FoundryThread`
because the thread is backed by the Foundry Agent Service, and the implementation
requires the thread to be compatible with the service's API.
Compatibility constraints:
- `FoundryAgent` must work with `FoundryThread`.
- `OpenAIAssistantAgent` must work with `OpenAIAssistantThread`.
- `ResponsesAgent` must work with `ResponsesThread`, when using the stateful mode of the Responses API.
### Workflow-Agent compatibility considerations
For agent with private conversation state, the orchestration code cannot directly
modifies the conversation state of every agent in the workflow.
This means that for resetting the conversation state, branching a conversation,
or other orchestration logic, the agent must provides public
methods for the orchestration code to manipulate its conversation state.
Potential methods (just initial ideas):
- `reset()` to reset the conversation state.
- `branch()` to create a new branch of the conversation state from an existing state.
Example: AutoGen's MagenticOne orchestration requires the agents to be able to
reset their conversation states during re-planning. It is reasonable to expect
other types of orchestration logic will require behavior like branching
or backtracking.
For agent without conversation state, the orchestration code can directly
manipulate the thread that is passed to the agent's `run` method. So the orchestration code
can clone, fork, or reset the thread as needed.
This also means that the agent's converstion state must be abstracted as a thread.
### Extensibility considerations
For agent with private conversation state, the management of the conversation state
is completely up to the agent implementation. This means that custom agents can
be created with different conversation state management strategies, such as:
- Using a custom thread implementation that provides additional features.
- Using a custom state object that provides additional features.
When using a custom state object, the developer must also implement
methods for exporting and importing the state.
For agent without conversation state, the thread abstraction is required to
encapsulate the conversation state and ensure that the agent's `run` method
can use it without any issues. This puts a constraint on the agent implementation,
and also what can be represented as state in the thread.
Though, if the thread abstraction is designed well, it relieves the developer
from implementing the conversation state management logic themselves.
The developer only needs to come up with custom thread when the built-in thread
abstraction does not work with their custom agent.
### Discussion
- Either agent or thread must manage the conversation state.
- The class that manages the conversation state must provide a way to manipulate
it for orchestration purposes.
- Isolate thread as a separate required abstraction may introduce compatibility
issues.
- A thread abstraction with methods for manipulating the conversation state
should always be provided by the framework, whether it is exposed again
through the agent or not.
In a scenario with built-in agents and built-in threads, the developer experience
is nearly identical except for agent without conversation state the developer
must ensure the thread is compatible with the agent's `run` method.
In a scenario with custom agents and built-in threads, the developer experience
is simpler for agent without conversation state, as the thread abstraction
is already provided by the framework and the agent can use it directly. Plus,
the developer doesn't need to implement the conversation state management logic
through the agent's other methods, which will mostly likely be boilerplate code.
In a scenario with built-in agents and custom threads, the developer experience
is nearly identical, as in either case the developer must ensure
the agent's `run` method is compatible with the thread or general state object.
In a scenario with custom agents and custom threads, the developer experience
is nearly identical, as in either case the developer must ensure
the agent's `run` method is compatible with the thread or general state object,
and that the state management logic is implemented in the agent or the thread.
| Scenario | Agent with Conversation State | Agent without Conversation State |
|----------|------------------------------------------|---------------------------------------------|
| Built-in Agents, Built-in Threads | Simpler -- it should just work as there is no compatibility issue at runtime | Developer must ensure thread compatibility with agent's `run` method at runtime |
| Custom Agents, Built-in Threads | Developer must implement state management methods on the agent. | Simpler, as thread abstraction is provided by the framework and agent can use it directly |
| Built-in Agents, Custom Threads | Developer must ensure compatibility of the custom thread or state with agent's `run` method | Developer must ensure compatibility of the custom thread with agent's `run` method |
| Custom Agents, Custom Threads | Developer is fully responsible for implementing state management. | Developer is fully responsible for implementing state management. |
Overall, the agent without conversation state abstraction
provides a simpler and more consistent developer experience, as it relies on
the thread abstraction provided by the framework. The downside is that
developer must ensure the thread used is compatible with the agent's `run` method
-- this can be mitigated by enforcing strong types and validation, as well as
built-in factory methods for creating new threads given the agent type.
Another factor to consider is that Semantic Kernel already has agent abstraction
that passes a thread per invocation, so it is easier for us to migrate to the
new interface.
> **We should continue to question this decision as we implement more agents and workflows, and revisit the design.**
View File
+129
View File
@@ -0,0 +1,129 @@
## Evaluation
The goal of Evaluation is to enable developers measure both the quality of agent responses and the efficiency of their decision-making processes.
### Core Evaluation Concepts
To enable effective evaluation (mindful of the fact that agents may be implemented with different approaches or even frameworks), it is useful to focus on the following core concepts:
- **Standardized Trajectory Format**: A unified representation of agent interactions (messages, tool calls, events) enabling consistent evaluation across different agent implementations.
- **Trajectory and Outcome Evaluation**: Analyze both the path an agent takes and the final response it generates. This includes evaluating the sequence of tool calls, the order of operations, and the final output.
### Evaluation Components
The framework provides these key evaluation components:
- **Trajectory Converter**: Transforms agent runs from various frameworks into a standardized format for evaluation.
- **Metrics Library**:
- Computation-based metrics: Direct algorithms that calculate objective measures without requiring a model
- Model-based metrics: Evaluation criteria that require an AI model to assess subjective qualities
- **Judge**: For model-based metrics, a judge is the LLM responsible for applying evaluation criteria. Different judge models can be selected based on evaluation needs.
- **Evaluator**: Coordinates the evaluation process by running computation-based metrics directly and applying judges to model-based metrics.
- **Integration**: Connect with cloud evaluation services including Azure AI Evaluation.
### (Example) Metrics
Metrics may be pointwise (evaluating a single response on some criteria) or pairwise (evaluating two responses against each other e.g., where some ground truth is available).
#### Computation-based Metrics
- **Tool Match**: Measures tool call sequence matching in various ways:
- Exact Match: Perfect match with reference sequence
- In-Order Match: Required tools called in correct order (extra steps allowed)
- Any-Order Match: All required tools called regardless of order
- **Precision**: Proportion of agent's tool calls that match reference tool calls.
- **Recall**: Proportion of reference tool calls included in the agent's tool calls.
- **Single Tool Usage**: Checks if a specific tool was used during the trajectory.
- **Tool Call Errors**: Measures rate of tool call failures or errors.
- **Latency**: Time required for agent to complete its task.
#### Model-based Metrics
- **Task Adherence**: Evaluates how well the agent's response addresses the assigned task.
- **Coherence**: Assesses logical flow and internal consistency of the response.
- **Safety**: Detects potential harmful content in responses.
- **Follows Trajectory**: Evaluates if the response logically follows from the tools used.
- **Efficiency**: Measures if the agent took an optimal path to reach the solution.
This can build on the suite of metrics provided by [Azure AI evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/agent-evaluate-sdk).
### Sample Developer Experience
**Sample Developer Experience:**
1. **Run Agent**: Execute your agent on tasks to generate trajectories.
2. **Create Trajectory**: Structure task, run data, and optional reference.
3. **Configure Metrics**: Select pre-built or custom metrics for evaluation.
4. **Evaluate**: Run evaluator to get scores and detailed results.
5. **Analyze**: Review metrics to identify improvements.
```python
from azure.ai.evaluation import AzureOpenAIModelConfiguration
from agent_framework.evaluation import (
TrajectoryMatchMetric,
TaskAdherenceMetric,
Evaluator,
Trajectory
)
# Model configuration for judge
model_config = AzureOpenAIModelConfiguration(
azure_deployment="o3-mini",
api_version="2024-02-01",
temperature=0
)
# Run your agent
task = "What's the weather in Seattle?"
run = your_agent.run(task)
# Create trajectory object
trajectory = Trajectory(
task=task,
run=run,
reference=[ # Optional reference trajectory
{"type": "tool_call", "tool": "weather_api", "args": {"location": "Seattle"}},
{"type": "response", "content": "Weather information for Seattle"}
]
)
# Define metrics
trajectory_match = TrajectoryMatchMetric(match_type="exact")
task_adherence = TaskAdherenceMetric(
criteria={
"Task adherence": (
"Does the response address the user's request and incorporate "
"information from tool calls appropriately?"
)
},
rating_rubric={
"5": "Excellent - Fully addresses task with complete detail",
"4": "Good - Addresses most aspects effectively",
"3": "Adequate - Addresses core task, minor gaps",
"2": "Poor - Partial addressing with significant gaps",
"1": "Inadequate - Fails to address task properly"
}
)
# Create evaluator
evaluator = Evaluator(
metrics=[trajectory_match, task_adherence],
model_config=model_config,
trajectory=trajectory
)
# Run evaluation
result = evaluator.run()
# Results follow Azure format
print("Evaluation Results:")
for metric_name, score in result.items():
if isinstance(score, dict):
print(f"{metric_name}: {score.get('score', 'N/A')}")
print(f" Result: {score.get('result', 'N/A')}")
print(f" Reason: {score.get('reason', 'N/A')}")
else:
print(f"{metric_name}: {score}")
```
+50
View File
@@ -0,0 +1,50 @@
# Guardrails
The design goal is to provide a flexible and extensible way to implement guardrails
and a built-in set of guardrails that can be used for common use cases.
> NOTE: this is work in progress.
Guardrails can be template-based to adapt to different input data types, which
include:
- `Message` for agent messages.
- `ToolCall` for tool call requests.
- `ToolResult` for tool call results.
Guardrails are added to other components such as `ModelClient` and `MCPServer`
as hooks that are called before and after the main logic of the component.
For example, the `ModelClient` has methods to add input and output guardrails.
```python
model_client = ModelClient(...)
model_client.add_input_guardrails([
PIIGuardrail[Message](...),
SensitiveDataGuardrail[Message](...),
])
model_client.add_output_guardrails([
HarmfulContentGuardrail[Message](...),
])
```
Another example to show how to use a guardrail with an MCP server:
```python
guardrail = PIIGuardrail(
config={
"rules": [
{
"type": "email",
"action": "block"
},
{
"type": "phone",
"action": "block"
}
]
}
)
mcp_server = MCPServer(...)
mcp_server.add_output_guardrail(guardrail)
```
+71
View File
@@ -0,0 +1,71 @@
# MCP Servers
An MCP server is a component that wraps a session to an
[Model Context Protocol](https://modelcontextprotocol.io/) (MCP) server.
The tools provided by MCP server should match the tool interface to ensure
minimal boilerplate code when dealing with both tools and MCP servers.
Other features like sampling and resources, should be accessible through
the MCP server interface as well.
## MCP Server base class (draft)
```python
class MCPServer(ABC):
"""The base class for all MCP servers in the framework."""
@abstractmethod
async def list_tools(self, context: Context) -> list[ToolSchema]:
"""List all available tools in the MCP server.
Returns:
A list of tool schemas available in the MCP server.
"""
...
@abstractmethod
async def call_tool(
self,
call: ToolCall,
context: Context,
) -> ToolResult:
"""Call a tool with the given name and arguments.
Args:
tool_name: The name of the tool to call.
args: The arguments to pass to the tool.
context: The context for the current invocation of the MCP server.
Returns:
The result of calling the tool.
"""
...
def add_input_guardrails(
self,
guardrails: list[InputGuardrail[ToolCall]]
) -> None:
"""Add input guardrails to the MCP server.
Args:
guardrails: The list of input guardrails to add.
"""
...
def add_output_guardrails(
self,
guardrails: list[OutputGuardrail[ToolResult]]
) -> None:
"""Add output guardrails to the MCP server.
Args:
guardrails: The list of output guardrails to add.
"""
...
```
MCP specs have other APIs. We should consider adding them as well.
+85
View File
@@ -0,0 +1,85 @@
# Model Clients
A model client is a component that implements a unified interface for
interacting with different language models. It exposes a standardized metadata
about the model it provides (e.g., model name, tool call and vision capabilities, etc.)
to support validation and composition with other components.
The framework provides a set of pre-built model clients:
- `OpenAIChatCompletionClient`
- `AzureOpenAIChatCompletionClient`
- `AzureOpenAIResponseClient`
- `AzureAIClient`
- `AnthropicClient`
- `GeminiClient`
- `HuggingFaceClient`
- `OllamaClient`
- `VLLMClient`
- `ONNXRuntimeClient`
- `BedrockClient`
- `NIMClient`
Prompt template is a component that is used by model clients to generate prompts with parameters set based on some injected context.
prompts with parameters set based on some injected context.
This gets into the actual interface and implementation detail of model clients,
so we just mention it here.
The design goal is to provide integration with a wide range of model providers,
including both open-source and commercial models, while maintaining a consistent
interface for developers to use.
## `ModelClient` base class (draft)
```python
class ModelClient(ABC):
"""The base class for all model clients in the framework."""
@abstractmethod
async def create(
self,
thread: Thread,
context: Context,
stream: bool = False,
tools: Optional[list[Tool]] = None,
output_format: Optional[OutputFormat] = None,
) -> Message:
"""Generate a response from the model based on the provided messages.
Args:
thread: The conversation context to generate a response.
context: The context for the current invocation of the model client.
This is for accessing event channels for streaming tokens.
stream: Whether to stream the response tokens.
tools: Optional list of tools to use for tool calling.
output_format: Optional structured output format for the response.
If provided, the model will generate a response in this format
and returns a structured response message.
Returns:
The generated response message.
"""
...
def add_input_guardrails(
self,
guardrails: list[InputGuardrail[Message]]
) -> None:
"""Add input guardrails to the model client.
Args:
guardrails: The list of input guardrails to add.
"""
...
def add_output_guardrails(
self,
guardrails: list[OutputGuardrail[Message]]
) -> None:
"""Add output guardrails to the model client.
Args:
guardrails: The list of output guardrails to add.
"""
...
```
+3
View File
@@ -0,0 +1,3 @@
# Observability and Monitoring
Traces should follow the [OTEL GenAI Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
+10
View File
@@ -0,0 +1,10 @@
# Optimization and Tuning
> For future consideration.
The framework should support optimization of agents and workflows
with task feedback, by tuning the various components such as
system prompts, model parameters, and tool configurations.
We should also consider fine-tuning of the models and embeddings
as part of the optimization process.
+29
View File
@@ -0,0 +1,29 @@
# Threads
Threads are stateful objects to manage the conversation context of an agent or a workflow.
They are meant to be shown to the user as part of a user interface.
They can be persisted to a database or a file system, and used to
resume a previous user session.
Thread should use message and content types as defined in [Core Data Types](types.md).
A thread can contain sub-threads as a dictionary of threads.
This is to ensure agents in a workflow can run concurrently on different threads.
The default thread has the key `main` and the sub-threads having keys that are usually
corresponding to the agents in a workflow.
For workflows, thread should also support the concept of execution state, which includes:
- The history of steps taken.
- The current step in the workflow.
- The next steps to be taken.
This is to ensure the workflow can be resumed from where it left off, without losing
the state of execution.
The framework should provides default implementations of a thread class that:
- Can be backed by a database (i.e., Redis) or a file system (i.e., JSON file).
- Can be backed by the Foundry Agent Service.
- Can be copied and forked.
- Can be serialized and deserialized to/from JSON.
- Can support checkpointing, rollback, and time travel, for both agent and workflow.
- Can automantically export truncated views to be used by model clients to keep the context size within limits.
+215
View File
@@ -0,0 +1,215 @@
# Tools
> The design goal is to make it easy to create new tools and integrate existing APIs and make them available to agents.
A tool is a component that can be used to invoke procedure code
and returns a well-defined result type to the caller.
The result type should indicate the success or failure of the invocation,
as well as the output of the invocation in terms of the core data types.
There may be other fields in the result type for things like
side effects, etc.. We should address this when designing the
tool interface.
A tool may have arguments for invocation.
The arguments must be defined using JSON schema that language model supports.
A tool may have dependencies such as tokens, credentials,
or output message channels that will be provided by through
a context variable passed to the tool when it is invoked.
A tool may also have guardrails that are used to ensure the
tool is invoked with proper arguments, or that the agent has the
right context such as human approval to invoke the tool.
The framework provides a set of pre-built tools:
- `FunctionTool`: a tool that wraps a function.
- `AzureAISearchTool`: a tool that is backed by Azure AI Search Service.
- `OpenAPITool`: a tool that is backed by a service that defines an OpenAPI spec.
- Other tools backed by Foundry.
## `Tool` base class
```python
@dataclass
class ToolResult:
"""The result of running a tool."""
is_error: bool
output: List[ImageContent | TextContent] # The content types are defined as part of the core data types.
... # Other fields, could be extended to include more for application-specific needs.
class Tool(ABC):
"""The base class for all tools in the framework."""
@property
def name(self) -> str:
"""The name of the tool, used to identify it in the system."""
...
@property
def description(self) -> str:
"""The description of the tool, used to provide information about its
functionality.
"""
...
@property
def schema(self) -> ToolSchema:
"""The schema of the tool, which defines the JSON schema of the input
arguments."""
...
@property
def strict(self) -> bool:
"""Whether the JSON schema is in strict mode. If true, no optional
arguments are allowed.
"""
...
async def __call__(
self,
call: ToolCall,
context: Context,
) -> ToolResult:
"""The method to call to run the tool with arguments and return the result.
Args:
call: The tool call containing the name and arguments to pass to the tool.
context: The context for the current invocation of the tool, providing
access to the event channel, and human-in-the-loop (HITL) features.
Returns:
The result of running the tool.
"""
try:
# Call the on_invoke method to allow for input guardrails to be applied
# to the arguments before the tool is run.
await self.on_invoke(args, context)
# Call the run method to actually run the tool.
result = await self.run(args, context)
# Call the on_output method to handle the output of the tool.
result = await self.on_output(result, context)
except Exception as e:
# If an error occurs, call the on_error method to handle it.
result = await self.on_error(e, context)
return result
@abstractmethod
async def run(
self,
calls: ToolCall,
context: Context,
) -> ToolResult:
"""The method called by the tool itself to run the tool with arguments and return the result."""
...
async def on_invoke(
self,
calls: ToolCall,
context: Context,
) -> None:
"""The method called by the tool when is invoked but before it is run.
This is useful for input guardrails to be applied to the arguments
before the tool is run.
"""
...
async def on_error(
self,
error: Exception,
context: Context,
) -> ToolResult:
"""The method called by the tool when an error is raised."""
...
async def on_output(
self,
output: ToolResult,
context: Context,
) -> ToolResult:
"""The method called by the tool when the output is ready.
This is where output guardrails can be applied to the result
before it is returned to the caller.
"""
...
def add_input_guardrails(
self,
guardrails: list[InputGuardrail[ToolCall]]
) -> None:
"""Add input guardrails to the tool.
Args:
guardrails: The list of input guardrails to add.
"""
...
def add_output_guardrails(
self,
guardrails: list[OutputGuardrail[ToolResult]]
) -> None:
"""Add output guardrails to the tool.
Args:
guardrails: The list of output guardrails to add.
"""
...
def add_on_error_func(
self,
on_error_func: Callable[[Exception, Context], Awaitable[ToolResult]]
) -> None:
"""Add a function to call when an error is raised during the call to `run`.
Args:
on_error_func: The function to call when an error is raised.
"""
...
```
## `FunctionTool`
The `FunctionTool` is a decorator that can be used to create a tool from a function.
```python
@FunctionTool
def web_search(
query: str,
num_results: int = 10,
) -> str:
"""A tool that performs a web search and returns the results."""
...
```
`FunctionTool` supports customization of the following:
- `name`
- `description`
- `on_error_func`: the function to call when an error is raised during the call to `run`.
- `strict`: whether the JSON schema is in strict mode. If true, no optional
arguments are allowed.
- `input_guardrails`: a list of input guardrails to apply to the arguments
before the tool is run.
- `output_guardrails`: a list of output guardrails to apply to the result
before it is returned.
## `AgentTool`
The `AgentTool` is a wrapper around an agent that can be used as a tool.
```python
agent = SomeAgent(...)
tool = AgentTool(
agent=agent,
name="SomeAgent",
description="Some description of this agent tool.",
output_extractor=..., # Optional, a function to extract a ToolResult from the agent's run Result.
on_error_func=..., # Optional, a function to call when an error is raised during the call to `run`.
)
```
The argument to the `AgentTool` is a single string.
> NOTE: Do we also need to support passing a thread to the agent tool?
+118
View File
@@ -0,0 +1,118 @@
# Core Data Types
A design goal of the new framework to simplify the interaction between agent components
through a common set of data types, minimizing boilerplate code
in the application for transforming data between components.
For example, text, images, function calls, tool schema are
all examples of such data types.
These data types are used to interact with agent components (model clients, tools, MCP, threads, and memory),
forming the connective tissue between those components.
In AutoGen, these are the data types mostly defined in `autogen_core.models` module,
and others like `autogen_core.Image` and `autogen_core.FunctionCall`. This is just
an example as AutoGen has no formal definition of model context.
To start, we should follow [MEAI](https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.ai?view=net-9.0-pp).
This document describes the data types from Python perspective,
while for .NET, we should directly use the MEAI data types.
## Content types
```python
class AIContent(ABC):
"""Base class for all AI content types."""
additional_properties: Dict[str, Any] = field(default_factory=dict)
"""Additional properties for extensibility, allowing custom fields."""
class DataContent(AIContent):
"""Data content type."""
data: bytes # Raw binary data.
media_type: str # MIME type of the data, e.g., "image/png", "application/json"
uri: str # URI constructed from the data.
base64: str # Base64 encoded data for easy transport.
class ErrorContent(AIContent):
"""Error content type."""
details: str # Detailed error message.
error_code: str # Error code for programmatic handling.
message: str # Human-readable error message.
class FunctionCallContent(AIContent):
"""Function call content type."""
name: str # Name of the function to call.
arguments: Dict[str, Any] # Arguments for the function call, serialized as JSON.
call_id: str # Unique identifier for the function call.
exception: Optional[Exception] = None # Optional exception for error occurred while mapping the original function call data to this content type.
class FunctionResultContent(AIContent):
"""Function result content type."""
call_id: str # Unique identifier for the function call.
result: Any # Result of the function call, or a generic error message.
exception: Optional[Exception] = None # Optional exception for error occurred while executing the function call.
class TextContent(AIContent):
"""Text content type."""
text: str
class TextReasoningContent(AIContent):
"""Text reasoning content type."""
text: str
class UriContent(AIContent):
"""URI content type."""
uri: str # URI of the content, e.g., a link to an image or document.
media_type: str # MIME type of the content, e.g., "image/png", "application/pdf".
class UsageDetails:
input_token_count: Optional[int] = None
output_token_count: Optional[int] = None
additional_counts: Optional[Dict[str, int]] = None
total_token_count: Optional[int] = None
class UsageContent(AIContent):
"""Usage content type."""
details: UsageDetails
```
## `ChatMessage`
A message in a thread that is sent to or received from a model client.
> Should we use `Message` instead of `ChatMessage`?
> We may need to extend this class to support more framework-level functionalities
> such as handoff, stopping, and so on?
```python
class ChatRole(Enum):
"""The role of the author in a chat message."""
USER = "user"
ASSISTANT = "assistant"
SYSTEM = "system"
TOOL = "tool"
class ChatMessage:
message_id: str # Unique identifier for the message.
author: str # Unique identifier for the author of the message.
role: ChatRole # Role of the author in the chat, e.g., user, assistant, system, tool.
contents: List[AIContent] # List of content types in the message, e.g., text, images, function calls.
```
## Tool types
Align with the [MEAI tool types](https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.ai.aifunction?view=net-9.0-pp)
in terms of the core attributes and methods.
See [Tools](./tools.md) for more details.
## Model client types
Align with the MEAI model client types in terms of the core attributes and methods.
See [Models](./models.md) for more details.
+38
View File
@@ -0,0 +1,38 @@
# Vector Stores and Embedding Clients
A vector store is component that provides a unified interface for
interacting with different vector databases, similar to model clients.
It exposes indexing and querying methods, including vector, text-based
and hybrid queries.
The details can be filled in based on the existing vector abstraction
in Semantic Kernel.
The framework provides pre-built vector stores (already exist in
Semantic Kernel):
- Azure AI Search
- Cosmos DB
- Chroma
- Couchbase
- Elasticsearch
- Faiss
- In-memory
- JDBC
- MongoDB
- Pinecone
- Postgres
- Qdrant
- Redis
- SQL Server
- SQLite
- Volatile
- Weaviate
Many vector store implementations will require embedding clients
to function. An embedding client is a component that implements a unified interface
to interact with different embedding models.
The framework provides a set of pre-built embedding clients:
- TBD.
+201
View File
@@ -0,0 +1,201 @@
# Workflow
The design goal is to create workflows that can be specified in a declarative
way to allow for easy creation and modification without needing to change the
underlying code.
## `Workflow` is Agent
A `Workflow` is an agent composed of other agents. It follows the same interface
as an agent. This allows for nested workflows, where a workflow can contain other
workflows.
## Agents in a `Workflow`
Each agent (or a `Workflow`) in a `Workflow` has a thread on which it will
always run. The thread may be privated, or shared among some or all of the agents.
When do agents share a `thread`?
- When an agent is called through handoff or as a tool by another agent, the caller
agent's thread may be shared with the callee agent.
When do agents not share a `thread`?
- When a set of worker agents are called through a "fan-out" and "fan-in" pattern, where the worker
agents are called in parallel and the results are combined by an aggregator agent.
Thread sharing can be configured through the `Workflow`'s constructor.
By default, each agent has its own private thread and no sharing.
See [Threads](threads.md) for more details on how threads work.
## `Workflow` from control flow graph
A `Workflow` can be created from a control flow graph of agents.
The graph is a directed graph where each node is an agent and each edge
is a transition between agents. The graph can contain loops
and conditional transitions.
The control flow graph specifies the order in which agents are called
and the conditions under which they are called.
```python
# Create agent instances.
agent1 = MCPAgent(
model_client="OpenAIChatCompletionClient",
mcp_server=["MCPServer1", "MCPServer2"],
)
agent2 = MCPAgent(
model_client="OpenAIChatCompletionClient",
mcp_server=["MCPServer3", "MCPServer4"],
)
agent3 = MCPAgent(
model_client="OpenAIChatCompletionClient",
mcp_server="MCPServer5",
)
# Create a directed graph of agents with conditional loops and transitions.
# The graph builder validates the graph.
graph = GraphBuilder() \
.add_agent(agent1) \
.add_agent(agent2) \
.add_agent(agent3) \
.add_loop(agent1, agent2, conditions=Any(...)) \
.add_transition(agent2, agent3, conditions=Any(..., All(...))]) \
.build()
# Create a workflow from the graph.
workflow = Workflow(graph=graph)
```
## `Workflow` from message router
By default, each message is delivered to an _inbox_ of every agent in a `Workflow`.
When an agent is called, the inbox is cleared and the messages are added
to the thread that is used by the agent.
If multiple agents share a thread, each message is added exactly once to the thread.
To customize the message flow, we can configure how each inbox behaves.
Each agent's inbox can be configured to only accept messages from a specific sender(s).
We can also configure the inbox batch size, time-to-live for messages in the inbox
and various other parameters that controls how the inbox is processed.
The configuration of agents' inboxes is done using a `Router` object,
which can be built using a `RouterBuilder` object.
```python
graph = ...
router = RouterBuilder() \
.add_route(source=agent1, target=agent2) \ # Agent2 will receive messages from agent1.
.add_route(source=[agent1, agent2], target=agent3, batch_size=10, ttl="1h") \ # Agent3 will receive messages from agent1 and agent2, with a batch size of 10 and a time-to-live of 1 hour.
.add_route(source=Router.ANY, target=agent4) \ # Agent4 will receive all messages.
).build()
# Create a workflow from the graph and router.
workflow = Workflow(graph=graph, router=router)
```
You can also skip the graph all together and just create a workflow from the router.
In this case, all agents will run concurrently to process the messages delivered
to their inboxes, according to the inbox rules.
```python
# Create a workflow from the router.
workflow = Workflow(router=router)
```
The validation of the router is done as part of the workflow creation, to ensure
that no gap exists in the routing, and warning for cascading routes.
## Run `Workflow`
It is the same as running an agent.
```python
# Create a message batch to send to the workflow.
# The run context is used to pass in the event channel and other context
# shared by the agents.
thread = [
Message("Hello"),
Message("Can you find the file 'foo.txt' for me?"),
]
context = RunContext(event_channel="console")
result = await workflow.run(thread, context=context)
```
## `Workflow` has a final response
A `Workflow` is expected to have a final response, which is the final response in the
result of the last agent in the workflow. The final response is returned as part of the
`Result` object returned by the `run` method.
This is to ensure the workflow can be used in the same way as an agent.
## Stopping `Workflow`
A `workflow` may run indefinitely, so it is important to have a way to stop it.
```python
# Use a stopping condition to stop the workflow when the condition is met.
# Detail design TBD.
condition = StopCondition(
condition=Any(...),
timeout="1h",
)
workflow = Workflow(graph=graph, stop_condition=condition)
```
TBD.
## `Workflow` can be stateless
The workflow state is kept in the thread object as input to the `run` method.
If not provided, the workflow will create new sub-threads for each agent
in the workflow for their private threads, otherwise, the workflow will
use the provided sub-thread.
```python
# Create a workflow with a graph and router.
workflow = Workflow(graph=graph, router=router, stop_condition=condition)
# Create a new thread.
thread = [
Message("Hello"),
Message("Can you find the file 'foo.txt' for me?"),
]
# Run the workflow.
result = await workflow.run(thread, context=context)
# Update the thread with new messages from the user.
thread = result.thread + [
Message("Can you find the file 'bar.txt' for me?"),
]
# Resume the workflow from where it left off.
result = await workflow.run(thread, context=context)
```
Read more about [Threads](threads.md) for more details on threads.
## Pre-defined workflows
The framework ships with a few pre-defined workflows for common orchestration
patterns. These workflows can be used as-is or as a starting point for
new developers, however, when using them, you should be aware of the underlying
implementation and move on to custom workflows when a limit is reached.
The pre-defined workflows are:
- `Sequential`: A sequential workflow that calls each agent in order,
its message flow can be configured separately.
- `MapReduce`: A map-reduce workflow that splits a task into smaller
tasks, runs them in parallel and then combines the results.
- `RoundRobinGroupChat`: agents are called in a round-robin fashion in a loop.
- `SelectorGroupChat`: agents are selected on each iteration by the workflow's built-in
LLM based selector.
- `Swarm`: use handoffs.
The predefined workflows are implemented as subclasses of the `Workflow` class.