mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
Design doc draft (#5)
* wip * wip * wip * wip * wip * wip * Update docs/design/main.md Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com> * Update docs/design/main.md Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com> * wip * wip * wip * wip * wip * wip * wip * wip * wip * update * update * update * wip * wip * wip * wip * address comment * update * add custom agent example * address comment * update code teaser * Update docs/design/main.md Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com> * update * address comments * update guardrails * address some of mark's comments * add new separate sections for agents and workflows * update agent doc * Update agent.md Co-authored-by: Jack Gerrits <jackgerrits@users.noreply.github.com> * add foundry agent doc * wip * refine the component registration interface with agent runtime * update * workflows * update * update * Update * Update * update * Update design doc to remove runtime * Update * Update * Update * update * Add eval section notes (#9) * add notes on eval * remove duplicate title * update docs * update docs * save updates before merge * update evaluation script * Update agents.md * update workflows * Update Co-authored-by: Jack Gerrits <jackgerrits@users.noreply.github.com> * update workflow * Updated design doc * Update * Update * update * update * Update * update * update * Update * update * Update with agent abstraction alternatives * Update discussion * Update * update * Update * Update * Update * Update --------- Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com> Co-authored-by: Jack Gerrits <jackgerrits@users.noreply.github.com> Co-authored-by: Victor Dibia <chuvidi2003@gmail.com>
This commit is contained in:
committed by
GitHub
Unverified
parent
d0531bb93b
commit
6089446f04
@@ -0,0 +1,103 @@
|
||||
# Agent Framework Design Doc
|
||||
|
||||
What values does the framework provide?
|
||||
|
||||
- A set of configurable, extensible and high-quality components (e.g., model clients, tools, MCP servers and memory).
|
||||
- An easy path for deploying, securing and scaling applications, both locally and in the cloud.
|
||||
- Integration with tools for monitoring, debugging, evaluation and optimization, both locally and in the cloud.
|
||||
- A community of developers and users for support, ideas, and contributions, benefiting everyone in the ecosystem.
|
||||
|
||||
What is this document?
|
||||
|
||||
- An overview of the new framework.
|
||||
- Defining the major elements of the framework and their relationships.
|
||||
- Detailed design of each element and its implementation will be in a separate document.
|
||||
|
||||
## Core Data Types
|
||||
|
||||
To unify the interaction between components, we define a set of core
|
||||
data types that are used throughout the framework.
|
||||
|
||||
See [Core Data Types](types.md) for more details.
|
||||
|
||||
## Components
|
||||
|
||||
A component is a class that provides a specific functionality and can be used
|
||||
independently by applications.
|
||||
There are two types of components in the framework: agent components and agents. Agent components are the building blocks of agents, while agents are
|
||||
the higher-level components, and can be composed from agent components
|
||||
and other agents (as in workflows).
|
||||
|
||||
The framework defines the following components. Follow the links to
|
||||
find the design details of each component:
|
||||
|
||||
- Agent Components:
|
||||
- [Model Client](models.md)
|
||||
- [Vector Store and Embedding Client](vector-stores.md)
|
||||
- [Tool](tools.md)
|
||||
- [MCP Server](mcp-servers.md)
|
||||
- [Memory](memory.md)
|
||||
- [Thread](threads.md)
|
||||
- [Guardrail](guardrails.md)
|
||||
- Agent and Workflow:
|
||||
- [Agent](agents.md)
|
||||
- [Workflow](workflows.md)
|
||||
|
||||
### Composition
|
||||
|
||||
Components can be composed to create complex components. For example,
|
||||
an agent can be composed from model clients, tools and memory,
|
||||
and a tool can be composed from an agent or a workflow.
|
||||
It is the responsibility of the framework to validate components
|
||||
and their composition.
|
||||
|
||||
### Configuration
|
||||
|
||||
A component can be created from a set of serializable configuration parameters,
|
||||
with the help of dependency injection to resolve non-serializable dependencies.
|
||||
|
||||
### Relationships
|
||||
|
||||
The following diagram shows the component relationship of the framework:
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Component[Component] --> |extends| Agent[Agent]
|
||||
Agent --> |extends| Workflow[Workflow]
|
||||
|
||||
Component --> |extends| ModelClient[Model Client]
|
||||
Component --> |extends| VectorStore[Vector Store]
|
||||
Component --> |extends| EmbeddingClient[Embedding Client]
|
||||
Component --> |extends| Tool[Tool]
|
||||
Component --> |extends| MCPServer[MCP Server]
|
||||
Component --> |extends| Memory[Memory]
|
||||
Component --> |extends| Thread[Thread]
|
||||
Component --> |extends| Guardrail[Guardrail]
|
||||
|
||||
Agent --> |uses| uses1[Model Client]
|
||||
Agent --> |uses| uses2[Thread]
|
||||
Agent --> |uses| uses3[Tools and MCP Servers]
|
||||
Agent --> |uses| uses4[Memory]
|
||||
Agent --> |uses| uses5[Guardrail]
|
||||
|
||||
Workflow --> |contains| contains[Child Agents]
|
||||
|
||||
Memory --> |uses| uses5[Vector Store]
|
||||
VectorStore --> |uses| uses6[Embedding Client]
|
||||
```
|
||||
|
||||
## Deployment and Scaling
|
||||
|
||||
[Deployment](deployment.md).
|
||||
|
||||
## Observability and Monitoring
|
||||
|
||||
[Observability](observability.md).
|
||||
|
||||
## Evaluation
|
||||
|
||||
[Evaluation](evaluation.md).
|
||||
|
||||
## Optimization
|
||||
|
||||
[Optimization](optimization.md).
|
||||
@@ -0,0 +1,476 @@
|
||||
# Agents
|
||||
|
||||
An agent is a component that processes messages in a thread and returns a result.
|
||||
|
||||
During its handling of messages, an agent may:
|
||||
|
||||
- Use model client to process messages,
|
||||
- Use thread to keep track of the interaction with the model,
|
||||
- Invoke tools or MCP servers, and
|
||||
- Retrieve and store data through memory.
|
||||
|
||||
It is up to the implementation of the agent class to decide how these components are used.
|
||||
|
||||
__An important design goal of the framework is to ensure the developer experience
|
||||
of creating custom agent is as easy as possible.__ Existing frameworks
|
||||
have made "kitchen-sink" agents that are hard to understand and maintain.
|
||||
|
||||
An agent might not use the components provided by the framework to implement
|
||||
the agent interface.
|
||||
Azure AI Agent is an example of such agent: its implementation is
|
||||
backed by the Azure AI Agent Service.
|
||||
|
||||
The framework provides a set of pre-built agents:
|
||||
|
||||
- `ChatCompletionAgent`: an agent that uses a chat-completion model to process messages
|
||||
and use thread, memory, tools and MCP servers in a configurable way. __If we can make
|
||||
custom agents easy to implement, we can remove this agent.__
|
||||
- `AzureAIAgent`: an agent that is backed by Azure AI Agent Service.
|
||||
- `ResponsesAgent`: an agent that is backed by OpenAI's Responses API.
|
||||
- `A2AAgent`: an agent that is backed by the [A2A Protocol](https://google.github.io/A2A/documentation/).
|
||||
|
||||
## `Agent` protocol
|
||||
|
||||
```python
|
||||
class Agent(Protocol):
|
||||
"""The protocol for all agents in the framework."""
|
||||
|
||||
async def run(
|
||||
self,
|
||||
thread: Thread,
|
||||
context: Context,
|
||||
) -> Result:
|
||||
"""The method to run the agent on a thread of messages, and return the result.
|
||||
|
||||
Args:
|
||||
thread: The thread of messages to process: it may be a local thread
|
||||
or a stub thread that is backed by a remote service.
|
||||
context: The context for the current invocation of the agent, providing
|
||||
access to the event channel, and human-in-the-loop (HITL) features.
|
||||
|
||||
Returns:
|
||||
The result of running the agent, which includes the final response.
|
||||
"""
|
||||
...
|
||||
|
||||
|
||||
@dataclass
|
||||
class Context:
|
||||
"""The context for the current invocation of the agent."""
|
||||
event_handler: EventHandler
|
||||
"""The event consumer for handling events emitted by the agent."""
|
||||
user_input_source: UserInputSource
|
||||
"""The user input source for requesting for user input during the agent run."""
|
||||
... # Other fields, could be extended to include more for application-specific needs.
|
||||
|
||||
|
||||
@dataclass
|
||||
class Result:
|
||||
"""The result of running an agent."""
|
||||
final_response: Message
|
||||
... # Other fields, could be extended to include more for application-specific needs.
|
||||
```
|
||||
|
||||
## `ToolCallingAgent` example
|
||||
|
||||
Here is an example of a custom agent that calls a tool and returns the result.
|
||||
The `ToolCallingAgent` implements the `Agent` base class and
|
||||
it implements the `run` method to process incoming messages and call tools if needed.
|
||||
|
||||
```python
|
||||
class ToolCallingAgent(Agent):
|
||||
def __init__(
|
||||
self,
|
||||
model_client: ModelClient,
|
||||
tools: list[Tool],
|
||||
) -> None:
|
||||
self.model_client = model_client
|
||||
self.tools = tools
|
||||
|
||||
async def run(self, thread: Thread, context: Context) -> Result:
|
||||
# Create a response using the model client, passing the thread and context.
|
||||
create_result = await self.model_client.create(thread, context, tools=self.tools)
|
||||
# Emit the event to notify the workflow consumer of a model response.
|
||||
await context.emit(ModelResponseEvent(create_result))
|
||||
if create_result.is_tool_call():
|
||||
# Get user approval for the tool call through the context.
|
||||
approval = await context.get_user_approval(create_result.tool_calls)
|
||||
if not approval:
|
||||
# ... return a canned response.
|
||||
# Call the tools with the tool calls in the response.
|
||||
tools = ... # Find the tool by name in the tools list.
|
||||
tool_results = ... # Call the tool with the tool call arguments.
|
||||
# Emit the event to notify the workflow consumer of a tool call.
|
||||
await context.emit(ToolCallEvent(tool_result))
|
||||
# Update the thread with the tool result.
|
||||
await thread.append(tool_result.to_messages())
|
||||
# Return the tool result as the response.
|
||||
return Result(
|
||||
final_response=tool_result,
|
||||
)
|
||||
else:
|
||||
# Return the response as the result.
|
||||
return Result(
|
||||
final_response=create_result,
|
||||
)
|
||||
```
|
||||
|
||||
Things to note in the implementation of the `run` method:
|
||||
- Orchestration of tools and model is completly customizable.
|
||||
- Components such as `thread` and `model_client` interacts smoothly with little boilerplate code.
|
||||
- The `context` parameter provides convenient access to the workflow run fixtures such as event channel.
|
||||
|
||||
An agent doesn't need to use components provided by the framework to implement the agent interface.
|
||||
|
||||
For example, in a multi-agent workflow, we may need a verification agent in a using deterministic
|
||||
logic to critic another agent's response.
|
||||
|
||||
```python
|
||||
class CriticAgent(Agent):
|
||||
def __init__(self) -> None:
|
||||
self.verification_logic = ... # Some verification logic, e.g. a set of rules.
|
||||
|
||||
async def run(self, thread: Thread, context: Context) -> Result:
|
||||
# Use the verification logic to verify the last message in the thread.
|
||||
response = thread.get_last_message()
|
||||
is_verified = self.verification_logic.verify(response)
|
||||
if is_verified:
|
||||
final_response = Message("The response is verified.")
|
||||
else:
|
||||
final_response = Message("The response is not verified.")
|
||||
|
||||
return Result(
|
||||
final_response=final_response,
|
||||
)
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
A _run_ is a single invocation of the agent or a workflow given a thread of messages.
|
||||
|
||||
## Run agent
|
||||
|
||||
Developer can instantiate a subclass of `Agent` directly using it's constructor,
|
||||
and run it by calling the `run` method.
|
||||
|
||||
```python
|
||||
@FunctionTool
|
||||
def my_tool(input: str) -> str:
|
||||
return f"Tool result for {input}"
|
||||
|
||||
model_client = OpenAIChatCompletionClient("gpt-4.1")
|
||||
agent = ToolCallingAgent(
|
||||
model_client=model_client,
|
||||
tools=[my_tool],
|
||||
)
|
||||
|
||||
# Create a thread for the current task.
|
||||
thread = [
|
||||
Message("Hello"),
|
||||
Message("Can you find the file 'foo.txt' for me?"),
|
||||
]
|
||||
|
||||
# Create a context that uses a handler that prints emitted events to the console,
|
||||
# and a user input source that reads from the console.
|
||||
context = Context(event_handler=ConsoleEventHandler(), user_input_source=ConsoleUserInputSource())
|
||||
|
||||
# Run the agent with the thread and context.
|
||||
result = await agent.run(thread, context)
|
||||
```
|
||||
|
||||
## User session
|
||||
|
||||
A user session is a logical concept which involves a sequence of messages exchanged between the user and the agent.
|
||||
Consider the following examples:
|
||||
|
||||
- A chat session in ChatGPT.
|
||||
- A delegation of task to a workflow agent from a user, with data exchanged between the user
|
||||
and the workflow such as occassional feedbacks from the user and status updates from the workflow.
|
||||
|
||||
A user session may involve multiple runs.
|
||||
|
||||
|
||||
## User session state
|
||||
|
||||
Rather than classifying agents as stateless or stateful, we focus on how state is managed during a user session.
|
||||
|
||||
There are several states that an application may maintain during a user session:
|
||||
- **Conversation or workflow state**. This is the conversation history or execution
|
||||
history in a workflow. This state is typically owned and managed by the thread object.
|
||||
- **Long-term memory**. This can be information relevant to the user,
|
||||
such as user preferences, past interactions, or other relevant data.
|
||||
This can also be information relevant to the task, such as past trajectories,
|
||||
past results, or other task-related data. These states are typically
|
||||
owned and managed by a memory object.
|
||||
|
||||
The thread is always passed through the agent's `run` method.
|
||||
Whereas the memory is can be set through the constructor of the agent.
|
||||
|
||||
See the [Memory](memory.md) design document for more details on how memory
|
||||
is used in the framework.
|
||||
|
||||
It is up to the application to decide whether to reuse state across different
|
||||
user sessions. The framework should provide the necessary methods and storage layer integration
|
||||
for persisting and retrieving state, but the application should decide how to use them.
|
||||
|
||||
## Run agent concurrently
|
||||
|
||||
If the agent just call models and tools that are stateless,
|
||||
we can run the same instance of the agent concurrently.
|
||||
|
||||
```python
|
||||
# Create threads for concurrent tasks.
|
||||
thread1 = [
|
||||
Message("Hello"),
|
||||
Message("Can you find the file 'foo.txt' for me?"),
|
||||
]
|
||||
thread2 = [
|
||||
Message("Hello"),
|
||||
Message("Can you find the file 'bar.txt' for me?"),
|
||||
]
|
||||
|
||||
# Run the agent concurrently on multiple threads.
|
||||
results = await asyncio.gather(
|
||||
agent.run(thread1, context),
|
||||
agent.run(thread2, context),
|
||||
)
|
||||
# The `context`'s event handlers will emit events from both runs.
|
||||
```
|
||||
|
||||
This is not always the right way to run concurrent agents, as some tools
|
||||
or memory associated with the agent may not be concurrent-safe.
|
||||
|
||||
It is up the application to decide if an agent can run concurrently,
|
||||
or multiple instances should be created for each thread.
|
||||
|
||||
|
||||
## Using Foundry Agent Service
|
||||
|
||||
The framework offers a built-in agent class for users of the Foundry Agent Service.
|
||||
The agent class essentially acts as a proxy to the agent hosted by the Foundry Agent Service.
|
||||
|
||||
```python
|
||||
agent = FoundryAgent(
|
||||
name="my_foundry_agent",
|
||||
project_client="ProjectClient",
|
||||
agent_id="my_agent_id", # If not provided, a new agent will be created.
|
||||
deployment_name="my_deployment",
|
||||
instruction="my_instruction",
|
||||
... # Other parameters for the agent creation.
|
||||
)
|
||||
|
||||
# Create a thread that is backed by the Foundry Agent Service.
|
||||
thread = FoundryThread(thread_id="my_thread_id")
|
||||
|
||||
# Run the agent on the thread and an new context that emits events to the console.
|
||||
result = await agent.run(thread, RunContext(event_channel="console"))
|
||||
```
|
||||
|
||||
## Alternative agent abstractions
|
||||
|
||||
There are two alternatives:
|
||||
|
||||
1. **Agent with private conversation state**: The agent manages its own conversation state,
|
||||
either by using a thread or other custom logics. The conversation state is
|
||||
not shared with other agents or workflows. It is up to the agent to decide how
|
||||
to manage the conversation state.
|
||||
2. **Agent without conversation state**: The conversation state is externalized
|
||||
and managed by a thread abstraction. The agent is invoked with a thread on
|
||||
every run. While it can still use the thread to append messages etc., it loses
|
||||
control over the conversation state the moment the run method returns.
|
||||
|
||||
### Protocol comparison
|
||||
|
||||
For agent with private conversation state, agent is invoked with new messages
|
||||
and the agent is responsible for managing the conversation state while exposing
|
||||
public methods for the orchestration code to manipulate its conversation state
|
||||
indirectly.
|
||||
|
||||
```python
|
||||
class Agent(Protocol):
|
||||
|
||||
async def run(
|
||||
self,
|
||||
messages: list[Message],
|
||||
context: Context,
|
||||
) -> Result:
|
||||
"""The method to run the agent and return the result.
|
||||
|
||||
Args:
|
||||
messages: The list of new messages to process.
|
||||
context: The context for the current invocation of the agent, providing
|
||||
access to the event channel, and human-in-the-loop (HITL) features.
|
||||
|
||||
Returns:
|
||||
The result of running the agent, which includes the final response.
|
||||
"""
|
||||
...
|
||||
|
||||
async def reset() -> None:
|
||||
"""Reset the conversation state of the agent."""
|
||||
...
|
||||
|
||||
# And other methods for managing the conversation state.
|
||||
```
|
||||
|
||||
For agent without conversation state, the agent is invoked with a thread
|
||||
and the agent is responsible for processing the messages in the thread.
|
||||
|
||||
```python
|
||||
class Agent(Protocol):
|
||||
|
||||
async def run(
|
||||
self,
|
||||
thread: Thread,
|
||||
context: Context,
|
||||
) -> Result:
|
||||
"""The method to run the agent on a thread of messages, and return the result.
|
||||
|
||||
Args:
|
||||
thread: The current conversation state.
|
||||
context: The context for the current invocation of the agent, providing
|
||||
access to the event channel, and human-in-the-loop (HITL) features.
|
||||
Returns:
|
||||
The result of running the agent, which includes the final response.
|
||||
"""
|
||||
...
|
||||
```
|
||||
|
||||
### Constructor comparison
|
||||
|
||||
For agent with private conversation state, the agent is initialized with
|
||||
the a state in addition to components like model client and tools, which could be a thread passed to the constructor,
|
||||
or a custom state object that the agent uses to manage its conversation state.
|
||||
|
||||
```python
|
||||
class CustomAgent(Agent):
|
||||
def __init__(self,
|
||||
model_client: ModelClient,
|
||||
tools: list[Tool],
|
||||
state: CustomState, # Could be a thread or a custom state object, or nothing at all.
|
||||
) -> None:
|
||||
self.model_client = model_client
|
||||
self.tools = tools
|
||||
self.state = state # Could be created by the agent within the constructor.
|
||||
|
||||
```
|
||||
|
||||
For agent without conversation state, the agent is initialized with
|
||||
the components it needs to process messages, such as a model client and tools.
|
||||
|
||||
```python
|
||||
class CustomAgent(Agent):
|
||||
def __init__(
|
||||
self,
|
||||
model_client: ModelClient,
|
||||
tools: list[Tool],
|
||||
) -> None:
|
||||
self.model_client = model_client
|
||||
self.tools = tools
|
||||
```
|
||||
|
||||
### Thread-Agent compatibility considerations
|
||||
|
||||
For agent with private conversation state, compatibility with thread is not a concern,
|
||||
as this is completely managed by the agent itself.
|
||||
|
||||
For agent without conversation state, the thread must be compatible with the agent's
|
||||
`run` method. For example, a `FoundryAgent` must work with a `FoundryThread`
|
||||
because the thread is backed by the Foundry Agent Service, and the implementation
|
||||
requires the thread to be compatible with the service's API.
|
||||
|
||||
Compatibility constraints:
|
||||
- `FoundryAgent` must work with `FoundryThread`.
|
||||
- `OpenAIAssistantAgent` must work with `OpenAIAssistantThread`.
|
||||
- `ResponsesAgent` must work with `ResponsesThread`, when using the stateful mode of the Responses API.
|
||||
|
||||
### Workflow-Agent compatibility considerations
|
||||
|
||||
For agent with private conversation state, the orchestration code cannot directly
|
||||
modifies the conversation state of every agent in the workflow.
|
||||
This means that for resetting the conversation state, branching a conversation,
|
||||
or other orchestration logic, the agent must provides public
|
||||
methods for the orchestration code to manipulate its conversation state.
|
||||
|
||||
Potential methods (just initial ideas):
|
||||
- `reset()` to reset the conversation state.
|
||||
- `branch()` to create a new branch of the conversation state from an existing state.
|
||||
|
||||
Example: AutoGen's MagenticOne orchestration requires the agents to be able to
|
||||
reset their conversation states during re-planning. It is reasonable to expect
|
||||
other types of orchestration logic will require behavior like branching
|
||||
or backtracking.
|
||||
|
||||
For agent without conversation state, the orchestration code can directly
|
||||
manipulate the thread that is passed to the agent's `run` method. So the orchestration code
|
||||
can clone, fork, or reset the thread as needed.
|
||||
This also means that the agent's converstion state must be abstracted as a thread.
|
||||
|
||||
### Extensibility considerations
|
||||
|
||||
For agent with private conversation state, the management of the conversation state
|
||||
is completely up to the agent implementation. This means that custom agents can
|
||||
be created with different conversation state management strategies, such as:
|
||||
- Using a custom thread implementation that provides additional features.
|
||||
- Using a custom state object that provides additional features.
|
||||
When using a custom state object, the developer must also implement
|
||||
methods for exporting and importing the state.
|
||||
|
||||
For agent without conversation state, the thread abstraction is required to
|
||||
encapsulate the conversation state and ensure that the agent's `run` method
|
||||
can use it without any issues. This puts a constraint on the agent implementation,
|
||||
and also what can be represented as state in the thread.
|
||||
Though, if the thread abstraction is designed well, it relieves the developer
|
||||
from implementing the conversation state management logic themselves.
|
||||
The developer only needs to come up with custom thread when the built-in thread
|
||||
abstraction does not work with their custom agent.
|
||||
|
||||
### Discussion
|
||||
|
||||
- Either agent or thread must manage the conversation state.
|
||||
- The class that manages the conversation state must provide a way to manipulate
|
||||
it for orchestration purposes.
|
||||
- Isolate thread as a separate required abstraction may introduce compatibility
|
||||
issues.
|
||||
- A thread abstraction with methods for manipulating the conversation state
|
||||
should always be provided by the framework, whether it is exposed again
|
||||
through the agent or not.
|
||||
|
||||
In a scenario with built-in agents and built-in threads, the developer experience
|
||||
is nearly identical except for agent without conversation state the developer
|
||||
must ensure the thread is compatible with the agent's `run` method.
|
||||
|
||||
In a scenario with custom agents and built-in threads, the developer experience
|
||||
is simpler for agent without conversation state, as the thread abstraction
|
||||
is already provided by the framework and the agent can use it directly. Plus,
|
||||
the developer doesn't need to implement the conversation state management logic
|
||||
through the agent's other methods, which will mostly likely be boilerplate code.
|
||||
|
||||
In a scenario with built-in agents and custom threads, the developer experience
|
||||
is nearly identical, as in either case the developer must ensure
|
||||
the agent's `run` method is compatible with the thread or general state object.
|
||||
|
||||
In a scenario with custom agents and custom threads, the developer experience
|
||||
is nearly identical, as in either case the developer must ensure
|
||||
the agent's `run` method is compatible with the thread or general state object,
|
||||
and that the state management logic is implemented in the agent or the thread.
|
||||
|
||||
| Scenario | Agent with Conversation State | Agent without Conversation State |
|
||||
|----------|------------------------------------------|---------------------------------------------|
|
||||
| Built-in Agents, Built-in Threads | Simpler -- it should just work as there is no compatibility issue at runtime | Developer must ensure thread compatibility with agent's `run` method at runtime |
|
||||
| Custom Agents, Built-in Threads | Developer must implement state management methods on the agent. | Simpler, as thread abstraction is provided by the framework and agent can use it directly |
|
||||
| Built-in Agents, Custom Threads | Developer must ensure compatibility of the custom thread or state with agent's `run` method | Developer must ensure compatibility of the custom thread with agent's `run` method |
|
||||
| Custom Agents, Custom Threads | Developer is fully responsible for implementing state management. | Developer is fully responsible for implementing state management. |
|
||||
|
||||
Overall, the agent without conversation state abstraction
|
||||
provides a simpler and more consistent developer experience, as it relies on
|
||||
the thread abstraction provided by the framework. The downside is that
|
||||
developer must ensure the thread used is compatible with the agent's `run` method
|
||||
-- this can be mitigated by enforcing strong types and validation, as well as
|
||||
built-in factory methods for creating new threads given the agent type.
|
||||
|
||||
Another factor to consider is that Semantic Kernel already has agent abstraction
|
||||
that passes a thread per invocation, so it is easier for us to migrate to the
|
||||
new interface.
|
||||
|
||||
> **We should continue to question this decision as we implement more agents and workflows, and revisit the design.**
|
||||
@@ -0,0 +1,129 @@
|
||||
## Evaluation
|
||||
|
||||
The goal of Evaluation is to enable developers measure both the quality of agent responses and the efficiency of their decision-making processes.
|
||||
|
||||
### Core Evaluation Concepts
|
||||
|
||||
To enable effective evaluation (mindful of the fact that agents may be implemented with different approaches or even frameworks), it is useful to focus on the following core concepts:
|
||||
|
||||
- **Standardized Trajectory Format**: A unified representation of agent interactions (messages, tool calls, events) enabling consistent evaluation across different agent implementations.
|
||||
- **Trajectory and Outcome Evaluation**: Analyze both the path an agent takes and the final response it generates. This includes evaluating the sequence of tool calls, the order of operations, and the final output.
|
||||
|
||||
### Evaluation Components
|
||||
|
||||
The framework provides these key evaluation components:
|
||||
|
||||
- **Trajectory Converter**: Transforms agent runs from various frameworks into a standardized format for evaluation.
|
||||
- **Metrics Library**:
|
||||
- Computation-based metrics: Direct algorithms that calculate objective measures without requiring a model
|
||||
- Model-based metrics: Evaluation criteria that require an AI model to assess subjective qualities
|
||||
- **Judge**: For model-based metrics, a judge is the LLM responsible for applying evaluation criteria. Different judge models can be selected based on evaluation needs.
|
||||
- **Evaluator**: Coordinates the evaluation process by running computation-based metrics directly and applying judges to model-based metrics.
|
||||
- **Integration**: Connect with cloud evaluation services including Azure AI Evaluation.
|
||||
|
||||
### (Example) Metrics
|
||||
|
||||
Metrics may be pointwise (evaluating a single response on some criteria) or pairwise (evaluating two responses against each other e.g., where some ground truth is available).
|
||||
|
||||
#### Computation-based Metrics
|
||||
|
||||
- **Tool Match**: Measures tool call sequence matching in various ways:
|
||||
- Exact Match: Perfect match with reference sequence
|
||||
- In-Order Match: Required tools called in correct order (extra steps allowed)
|
||||
- Any-Order Match: All required tools called regardless of order
|
||||
- **Precision**: Proportion of agent's tool calls that match reference tool calls.
|
||||
- **Recall**: Proportion of reference tool calls included in the agent's tool calls.
|
||||
- **Single Tool Usage**: Checks if a specific tool was used during the trajectory.
|
||||
- **Tool Call Errors**: Measures rate of tool call failures or errors.
|
||||
- **Latency**: Time required for agent to complete its task.
|
||||
|
||||
#### Model-based Metrics
|
||||
|
||||
- **Task Adherence**: Evaluates how well the agent's response addresses the assigned task.
|
||||
- **Coherence**: Assesses logical flow and internal consistency of the response.
|
||||
- **Safety**: Detects potential harmful content in responses.
|
||||
- **Follows Trajectory**: Evaluates if the response logically follows from the tools used.
|
||||
- **Efficiency**: Measures if the agent took an optimal path to reach the solution.
|
||||
|
||||
This can build on the suite of metrics provided by [Azure AI evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/agent-evaluate-sdk).
|
||||
|
||||
### Sample Developer Experience
|
||||
|
||||
**Sample Developer Experience:**
|
||||
|
||||
1. **Run Agent**: Execute your agent on tasks to generate trajectories.
|
||||
2. **Create Trajectory**: Structure task, run data, and optional reference.
|
||||
3. **Configure Metrics**: Select pre-built or custom metrics for evaluation.
|
||||
4. **Evaluate**: Run evaluator to get scores and detailed results.
|
||||
5. **Analyze**: Review metrics to identify improvements.
|
||||
|
||||
```python
|
||||
from azure.ai.evaluation import AzureOpenAIModelConfiguration
|
||||
from agent_framework.evaluation import (
|
||||
TrajectoryMatchMetric,
|
||||
TaskAdherenceMetric,
|
||||
Evaluator,
|
||||
Trajectory
|
||||
)
|
||||
|
||||
# Model configuration for judge
|
||||
model_config = AzureOpenAIModelConfiguration(
|
||||
azure_deployment="o3-mini",
|
||||
api_version="2024-02-01",
|
||||
temperature=0
|
||||
)
|
||||
|
||||
# Run your agent
|
||||
task = "What's the weather in Seattle?"
|
||||
run = your_agent.run(task)
|
||||
|
||||
# Create trajectory object
|
||||
trajectory = Trajectory(
|
||||
task=task,
|
||||
run=run,
|
||||
reference=[ # Optional reference trajectory
|
||||
{"type": "tool_call", "tool": "weather_api", "args": {"location": "Seattle"}},
|
||||
{"type": "response", "content": "Weather information for Seattle"}
|
||||
]
|
||||
)
|
||||
|
||||
# Define metrics
|
||||
trajectory_match = TrajectoryMatchMetric(match_type="exact")
|
||||
task_adherence = TaskAdherenceMetric(
|
||||
criteria={
|
||||
"Task adherence": (
|
||||
"Does the response address the user's request and incorporate "
|
||||
"information from tool calls appropriately?"
|
||||
)
|
||||
},
|
||||
rating_rubric={
|
||||
"5": "Excellent - Fully addresses task with complete detail",
|
||||
"4": "Good - Addresses most aspects effectively",
|
||||
"3": "Adequate - Addresses core task, minor gaps",
|
||||
"2": "Poor - Partial addressing with significant gaps",
|
||||
"1": "Inadequate - Fails to address task properly"
|
||||
}
|
||||
)
|
||||
|
||||
# Create evaluator
|
||||
evaluator = Evaluator(
|
||||
metrics=[trajectory_match, task_adherence],
|
||||
model_config=model_config,
|
||||
trajectory=trajectory
|
||||
)
|
||||
|
||||
# Run evaluation
|
||||
result = evaluator.run()
|
||||
|
||||
# Results follow Azure format
|
||||
print("Evaluation Results:")
|
||||
for metric_name, score in result.items():
|
||||
if isinstance(score, dict):
|
||||
print(f"{metric_name}: {score.get('score', 'N/A')}")
|
||||
print(f" Result: {score.get('result', 'N/A')}")
|
||||
print(f" Reason: {score.get('reason', 'N/A')}")
|
||||
else:
|
||||
print(f"{metric_name}: {score}")
|
||||
|
||||
|
||||
```
|
||||
@@ -0,0 +1,50 @@
|
||||
# Guardrails
|
||||
|
||||
The design goal is to provide a flexible and extensible way to implement guardrails
|
||||
and a built-in set of guardrails that can be used for common use cases.
|
||||
|
||||
> NOTE: this is work in progress.
|
||||
|
||||
Guardrails can be template-based to adapt to different input data types, which
|
||||
include:
|
||||
- `Message` for agent messages.
|
||||
- `ToolCall` for tool call requests.
|
||||
- `ToolResult` for tool call results.
|
||||
|
||||
Guardrails are added to other components such as `ModelClient` and `MCPServer`
|
||||
as hooks that are called before and after the main logic of the component.
|
||||
|
||||
For example, the `ModelClient` has methods to add input and output guardrails.
|
||||
|
||||
```python
|
||||
model_client = ModelClient(...)
|
||||
model_client.add_input_guardrails([
|
||||
PIIGuardrail[Message](...),
|
||||
SensitiveDataGuardrail[Message](...),
|
||||
])
|
||||
model_client.add_output_guardrails([
|
||||
HarmfulContentGuardrail[Message](...),
|
||||
])
|
||||
```
|
||||
|
||||
Another example to show how to use a guardrail with an MCP server:
|
||||
|
||||
```python
|
||||
guardrail = PIIGuardrail(
|
||||
config={
|
||||
"rules": [
|
||||
{
|
||||
"type": "email",
|
||||
"action": "block"
|
||||
},
|
||||
{
|
||||
"type": "phone",
|
||||
"action": "block"
|
||||
}
|
||||
]
|
||||
}
|
||||
)
|
||||
|
||||
mcp_server = MCPServer(...)
|
||||
mcp_server.add_output_guardrail(guardrail)
|
||||
```
|
||||
@@ -0,0 +1,71 @@
|
||||
# MCP Servers
|
||||
|
||||
An MCP server is a component that wraps a session to an
|
||||
[Model Context Protocol](https://modelcontextprotocol.io/) (MCP) server.
|
||||
|
||||
The tools provided by MCP server should match the tool interface to ensure
|
||||
minimal boilerplate code when dealing with both tools and MCP servers.
|
||||
|
||||
Other features like sampling and resources, should be accessible through
|
||||
the MCP server interface as well.
|
||||
|
||||
|
||||
## MCP Server base class (draft)
|
||||
|
||||
```python
|
||||
|
||||
class MCPServer(ABC):
|
||||
"""The base class for all MCP servers in the framework."""
|
||||
|
||||
@abstractmethod
|
||||
async def list_tools(self, context: Context) -> list[ToolSchema]:
|
||||
"""List all available tools in the MCP server.
|
||||
|
||||
Returns:
|
||||
A list of tool schemas available in the MCP server.
|
||||
"""
|
||||
...
|
||||
|
||||
@abstractmethod
|
||||
async def call_tool(
|
||||
self,
|
||||
call: ToolCall,
|
||||
context: Context,
|
||||
) -> ToolResult:
|
||||
"""Call a tool with the given name and arguments.
|
||||
|
||||
Args:
|
||||
tool_name: The name of the tool to call.
|
||||
args: The arguments to pass to the tool.
|
||||
context: The context for the current invocation of the MCP server.
|
||||
|
||||
Returns:
|
||||
The result of calling the tool.
|
||||
"""
|
||||
...
|
||||
|
||||
def add_input_guardrails(
|
||||
self,
|
||||
guardrails: list[InputGuardrail[ToolCall]]
|
||||
) -> None:
|
||||
"""Add input guardrails to the MCP server.
|
||||
|
||||
Args:
|
||||
guardrails: The list of input guardrails to add.
|
||||
"""
|
||||
...
|
||||
|
||||
def add_output_guardrails(
|
||||
self,
|
||||
guardrails: list[OutputGuardrail[ToolResult]]
|
||||
) -> None:
|
||||
"""Add output guardrails to the MCP server.
|
||||
|
||||
Args:
|
||||
guardrails: The list of output guardrails to add.
|
||||
"""
|
||||
...
|
||||
|
||||
```
|
||||
|
||||
MCP specs have other APIs. We should consider adding them as well.
|
||||
@@ -0,0 +1,85 @@
|
||||
# Model Clients
|
||||
|
||||
A model client is a component that implements a unified interface for
|
||||
interacting with different language models. It exposes a standardized metadata
|
||||
about the model it provides (e.g., model name, tool call and vision capabilities, etc.)
|
||||
to support validation and composition with other components.
|
||||
|
||||
The framework provides a set of pre-built model clients:
|
||||
|
||||
- `OpenAIChatCompletionClient`
|
||||
- `AzureOpenAIChatCompletionClient`
|
||||
- `AzureOpenAIResponseClient`
|
||||
- `AzureAIClient`
|
||||
- `AnthropicClient`
|
||||
- `GeminiClient`
|
||||
- `HuggingFaceClient`
|
||||
- `OllamaClient`
|
||||
- `VLLMClient`
|
||||
- `ONNXRuntimeClient`
|
||||
- `BedrockClient`
|
||||
- `NIMClient`
|
||||
|
||||
Prompt template is a component that is used by model clients to generate prompts with parameters set based on some injected context.
|
||||
prompts with parameters set based on some injected context.
|
||||
This gets into the actual interface and implementation detail of model clients,
|
||||
so we just mention it here.
|
||||
|
||||
The design goal is to provide integration with a wide range of model providers,
|
||||
including both open-source and commercial models, while maintaining a consistent
|
||||
interface for developers to use.
|
||||
|
||||
## `ModelClient` base class (draft)
|
||||
|
||||
```python
|
||||
class ModelClient(ABC):
|
||||
"""The base class for all model clients in the framework."""
|
||||
|
||||
@abstractmethod
|
||||
async def create(
|
||||
self,
|
||||
thread: Thread,
|
||||
context: Context,
|
||||
stream: bool = False,
|
||||
tools: Optional[list[Tool]] = None,
|
||||
output_format: Optional[OutputFormat] = None,
|
||||
) -> Message:
|
||||
"""Generate a response from the model based on the provided messages.
|
||||
|
||||
Args:
|
||||
thread: The conversation context to generate a response.
|
||||
context: The context for the current invocation of the model client.
|
||||
This is for accessing event channels for streaming tokens.
|
||||
stream: Whether to stream the response tokens.
|
||||
tools: Optional list of tools to use for tool calling.
|
||||
output_format: Optional structured output format for the response.
|
||||
If provided, the model will generate a response in this format
|
||||
and returns a structured response message.
|
||||
|
||||
Returns:
|
||||
The generated response message.
|
||||
"""
|
||||
...
|
||||
|
||||
def add_input_guardrails(
|
||||
self,
|
||||
guardrails: list[InputGuardrail[Message]]
|
||||
) -> None:
|
||||
"""Add input guardrails to the model client.
|
||||
|
||||
Args:
|
||||
guardrails: The list of input guardrails to add.
|
||||
"""
|
||||
...
|
||||
|
||||
def add_output_guardrails(
|
||||
self,
|
||||
guardrails: list[OutputGuardrail[Message]]
|
||||
) -> None:
|
||||
"""Add output guardrails to the model client.
|
||||
|
||||
Args:
|
||||
guardrails: The list of output guardrails to add.
|
||||
"""
|
||||
...
|
||||
```
|
||||
@@ -0,0 +1,3 @@
|
||||
# Observability and Monitoring
|
||||
|
||||
Traces should follow the [OTEL GenAI Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
|
||||
@@ -0,0 +1,10 @@
|
||||
# Optimization and Tuning
|
||||
|
||||
> For future consideration.
|
||||
|
||||
The framework should support optimization of agents and workflows
|
||||
with task feedback, by tuning the various components such as
|
||||
system prompts, model parameters, and tool configurations.
|
||||
|
||||
We should also consider fine-tuning of the models and embeddings
|
||||
as part of the optimization process.
|
||||
@@ -0,0 +1,29 @@
|
||||
# Threads
|
||||
|
||||
Threads are stateful objects to manage the conversation context of an agent or a workflow.
|
||||
They are meant to be shown to the user as part of a user interface.
|
||||
They can be persisted to a database or a file system, and used to
|
||||
resume a previous user session.
|
||||
|
||||
Thread should use message and content types as defined in [Core Data Types](types.md).
|
||||
|
||||
A thread can contain sub-threads as a dictionary of threads.
|
||||
This is to ensure agents in a workflow can run concurrently on different threads.
|
||||
The default thread has the key `main` and the sub-threads having keys that are usually
|
||||
corresponding to the agents in a workflow.
|
||||
|
||||
For workflows, thread should also support the concept of execution state, which includes:
|
||||
- The history of steps taken.
|
||||
- The current step in the workflow.
|
||||
- The next steps to be taken.
|
||||
|
||||
This is to ensure the workflow can be resumed from where it left off, without losing
|
||||
the state of execution.
|
||||
|
||||
The framework should provides default implementations of a thread class that:
|
||||
- Can be backed by a database (i.e., Redis) or a file system (i.e., JSON file).
|
||||
- Can be backed by the Foundry Agent Service.
|
||||
- Can be copied and forked.
|
||||
- Can be serialized and deserialized to/from JSON.
|
||||
- Can support checkpointing, rollback, and time travel, for both agent and workflow.
|
||||
- Can automantically export truncated views to be used by model clients to keep the context size within limits.
|
||||
@@ -0,0 +1,215 @@
|
||||
# Tools
|
||||
|
||||
> The design goal is to make it easy to create new tools and integrate existing APIs and make them available to agents.
|
||||
|
||||
A tool is a component that can be used to invoke procedure code
|
||||
and returns a well-defined result type to the caller.
|
||||
|
||||
The result type should indicate the success or failure of the invocation,
|
||||
as well as the output of the invocation in terms of the core data types.
|
||||
There may be other fields in the result type for things like
|
||||
side effects, etc.. We should address this when designing the
|
||||
tool interface.
|
||||
|
||||
A tool may have arguments for invocation.
|
||||
The arguments must be defined using JSON schema that language model supports.
|
||||
|
||||
A tool may have dependencies such as tokens, credentials,
|
||||
or output message channels that will be provided by through
|
||||
a context variable passed to the tool when it is invoked.
|
||||
|
||||
A tool may also have guardrails that are used to ensure the
|
||||
tool is invoked with proper arguments, or that the agent has the
|
||||
right context such as human approval to invoke the tool.
|
||||
|
||||
The framework provides a set of pre-built tools:
|
||||
|
||||
- `FunctionTool`: a tool that wraps a function.
|
||||
- `AzureAISearchTool`: a tool that is backed by Azure AI Search Service.
|
||||
- `OpenAPITool`: a tool that is backed by a service that defines an OpenAPI spec.
|
||||
- Other tools backed by Foundry.
|
||||
|
||||
## `Tool` base class
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ToolResult:
|
||||
"""The result of running a tool."""
|
||||
is_error: bool
|
||||
output: List[ImageContent | TextContent] # The content types are defined as part of the core data types.
|
||||
... # Other fields, could be extended to include more for application-specific needs.
|
||||
|
||||
class Tool(ABC):
|
||||
"""The base class for all tools in the framework."""
|
||||
|
||||
@property
|
||||
def name(self) -> str:
|
||||
"""The name of the tool, used to identify it in the system."""
|
||||
...
|
||||
|
||||
@property
|
||||
def description(self) -> str:
|
||||
"""The description of the tool, used to provide information about its
|
||||
functionality.
|
||||
"""
|
||||
...
|
||||
|
||||
@property
|
||||
def schema(self) -> ToolSchema:
|
||||
"""The schema of the tool, which defines the JSON schema of the input
|
||||
arguments."""
|
||||
...
|
||||
|
||||
@property
|
||||
def strict(self) -> bool:
|
||||
"""Whether the JSON schema is in strict mode. If true, no optional
|
||||
arguments are allowed.
|
||||
"""
|
||||
...
|
||||
|
||||
async def __call__(
|
||||
self,
|
||||
call: ToolCall,
|
||||
context: Context,
|
||||
) -> ToolResult:
|
||||
"""The method to call to run the tool with arguments and return the result.
|
||||
|
||||
Args:
|
||||
call: The tool call containing the name and arguments to pass to the tool.
|
||||
context: The context for the current invocation of the tool, providing
|
||||
access to the event channel, and human-in-the-loop (HITL) features.
|
||||
|
||||
Returns:
|
||||
The result of running the tool.
|
||||
"""
|
||||
try:
|
||||
# Call the on_invoke method to allow for input guardrails to be applied
|
||||
# to the arguments before the tool is run.
|
||||
await self.on_invoke(args, context)
|
||||
# Call the run method to actually run the tool.
|
||||
result = await self.run(args, context)
|
||||
# Call the on_output method to handle the output of the tool.
|
||||
result = await self.on_output(result, context)
|
||||
except Exception as e:
|
||||
# If an error occurs, call the on_error method to handle it.
|
||||
result = await self.on_error(e, context)
|
||||
return result
|
||||
|
||||
@abstractmethod
|
||||
async def run(
|
||||
self,
|
||||
calls: ToolCall,
|
||||
context: Context,
|
||||
) -> ToolResult:
|
||||
"""The method called by the tool itself to run the tool with arguments and return the result."""
|
||||
...
|
||||
|
||||
async def on_invoke(
|
||||
self,
|
||||
calls: ToolCall,
|
||||
context: Context,
|
||||
) -> None:
|
||||
"""The method called by the tool when is invoked but before it is run.
|
||||
|
||||
This is useful for input guardrails to be applied to the arguments
|
||||
before the tool is run.
|
||||
"""
|
||||
...
|
||||
|
||||
async def on_error(
|
||||
self,
|
||||
error: Exception,
|
||||
context: Context,
|
||||
) -> ToolResult:
|
||||
"""The method called by the tool when an error is raised."""
|
||||
...
|
||||
|
||||
async def on_output(
|
||||
self,
|
||||
output: ToolResult,
|
||||
context: Context,
|
||||
) -> ToolResult:
|
||||
"""The method called by the tool when the output is ready.
|
||||
|
||||
This is where output guardrails can be applied to the result
|
||||
before it is returned to the caller.
|
||||
"""
|
||||
...
|
||||
|
||||
def add_input_guardrails(
|
||||
self,
|
||||
guardrails: list[InputGuardrail[ToolCall]]
|
||||
) -> None:
|
||||
"""Add input guardrails to the tool.
|
||||
|
||||
Args:
|
||||
guardrails: The list of input guardrails to add.
|
||||
"""
|
||||
...
|
||||
|
||||
def add_output_guardrails(
|
||||
self,
|
||||
guardrails: list[OutputGuardrail[ToolResult]]
|
||||
) -> None:
|
||||
"""Add output guardrails to the tool.
|
||||
|
||||
Args:
|
||||
guardrails: The list of output guardrails to add.
|
||||
"""
|
||||
...
|
||||
|
||||
def add_on_error_func(
|
||||
self,
|
||||
on_error_func: Callable[[Exception, Context], Awaitable[ToolResult]]
|
||||
) -> None:
|
||||
"""Add a function to call when an error is raised during the call to `run`.
|
||||
|
||||
Args:
|
||||
on_error_func: The function to call when an error is raised.
|
||||
"""
|
||||
...
|
||||
```
|
||||
|
||||
## `FunctionTool`
|
||||
|
||||
The `FunctionTool` is a decorator that can be used to create a tool from a function.
|
||||
|
||||
```python
|
||||
@FunctionTool
|
||||
def web_search(
|
||||
query: str,
|
||||
num_results: int = 10,
|
||||
) -> str:
|
||||
"""A tool that performs a web search and returns the results."""
|
||||
...
|
||||
```
|
||||
|
||||
`FunctionTool` supports customization of the following:
|
||||
- `name`
|
||||
- `description`
|
||||
- `on_error_func`: the function to call when an error is raised during the call to `run`.
|
||||
- `strict`: whether the JSON schema is in strict mode. If true, no optional
|
||||
arguments are allowed.
|
||||
- `input_guardrails`: a list of input guardrails to apply to the arguments
|
||||
before the tool is run.
|
||||
- `output_guardrails`: a list of output guardrails to apply to the result
|
||||
before it is returned.
|
||||
|
||||
## `AgentTool`
|
||||
|
||||
The `AgentTool` is a wrapper around an agent that can be used as a tool.
|
||||
|
||||
```python
|
||||
agent = SomeAgent(...)
|
||||
tool = AgentTool(
|
||||
agent=agent,
|
||||
name="SomeAgent",
|
||||
description="Some description of this agent tool.",
|
||||
output_extractor=..., # Optional, a function to extract a ToolResult from the agent's run Result.
|
||||
on_error_func=..., # Optional, a function to call when an error is raised during the call to `run`.
|
||||
)
|
||||
```
|
||||
|
||||
The argument to the `AgentTool` is a single string.
|
||||
|
||||
> NOTE: Do we also need to support passing a thread to the agent tool?
|
||||
@@ -0,0 +1,118 @@
|
||||
# Core Data Types
|
||||
|
||||
A design goal of the new framework to simplify the interaction between agent components
|
||||
through a common set of data types, minimizing boilerplate code
|
||||
in the application for transforming data between components.
|
||||
|
||||
For example, text, images, function calls, tool schema are
|
||||
all examples of such data types.
|
||||
These data types are used to interact with agent components (model clients, tools, MCP, threads, and memory),
|
||||
forming the connective tissue between those components.
|
||||
|
||||
In AutoGen, these are the data types mostly defined in `autogen_core.models` module,
|
||||
and others like `autogen_core.Image` and `autogen_core.FunctionCall`. This is just
|
||||
an example as AutoGen has no formal definition of model context.
|
||||
|
||||
To start, we should follow [MEAI](https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.ai?view=net-9.0-pp).
|
||||
|
||||
This document describes the data types from Python perspective,
|
||||
while for .NET, we should directly use the MEAI data types.
|
||||
|
||||
## Content types
|
||||
|
||||
```python
|
||||
class AIContent(ABC):
|
||||
"""Base class for all AI content types."""
|
||||
additional_properties: Dict[str, Any] = field(default_factory=dict)
|
||||
"""Additional properties for extensibility, allowing custom fields."""
|
||||
|
||||
class DataContent(AIContent):
|
||||
"""Data content type."""
|
||||
data: bytes # Raw binary data.
|
||||
media_type: str # MIME type of the data, e.g., "image/png", "application/json"
|
||||
uri: str # URI constructed from the data.
|
||||
base64: str # Base64 encoded data for easy transport.
|
||||
|
||||
class ErrorContent(AIContent):
|
||||
"""Error content type."""
|
||||
details: str # Detailed error message.
|
||||
error_code: str # Error code for programmatic handling.
|
||||
message: str # Human-readable error message.
|
||||
|
||||
class FunctionCallContent(AIContent):
|
||||
"""Function call content type."""
|
||||
name: str # Name of the function to call.
|
||||
arguments: Dict[str, Any] # Arguments for the function call, serialized as JSON.
|
||||
call_id: str # Unique identifier for the function call.
|
||||
exception: Optional[Exception] = None # Optional exception for error occurred while mapping the original function call data to this content type.
|
||||
|
||||
class FunctionResultContent(AIContent):
|
||||
"""Function result content type."""
|
||||
call_id: str # Unique identifier for the function call.
|
||||
result: Any # Result of the function call, or a generic error message.
|
||||
exception: Optional[Exception] = None # Optional exception for error occurred while executing the function call.
|
||||
|
||||
class TextContent(AIContent):
|
||||
"""Text content type."""
|
||||
text: str
|
||||
|
||||
class TextReasoningContent(AIContent):
|
||||
"""Text reasoning content type."""
|
||||
text: str
|
||||
|
||||
|
||||
class UriContent(AIContent):
|
||||
"""URI content type."""
|
||||
uri: str # URI of the content, e.g., a link to an image or document.
|
||||
media_type: str # MIME type of the content, e.g., "image/png", "application/pdf".
|
||||
|
||||
|
||||
class UsageDetails:
|
||||
input_token_count: Optional[int] = None
|
||||
output_token_count: Optional[int] = None
|
||||
additional_counts: Optional[Dict[str, int]] = None
|
||||
total_token_count: Optional[int] = None
|
||||
|
||||
|
||||
class UsageContent(AIContent):
|
||||
"""Usage content type."""
|
||||
details: UsageDetails
|
||||
|
||||
```
|
||||
|
||||
## `ChatMessage`
|
||||
|
||||
A message in a thread that is sent to or received from a model client.
|
||||
|
||||
> Should we use `Message` instead of `ChatMessage`?
|
||||
|
||||
> We may need to extend this class to support more framework-level functionalities
|
||||
> such as handoff, stopping, and so on?
|
||||
|
||||
```python
|
||||
class ChatRole(Enum):
|
||||
"""The role of the author in a chat message."""
|
||||
USER = "user"
|
||||
ASSISTANT = "assistant"
|
||||
SYSTEM = "system"
|
||||
TOOL = "tool"
|
||||
|
||||
class ChatMessage:
|
||||
message_id: str # Unique identifier for the message.
|
||||
author: str # Unique identifier for the author of the message.
|
||||
role: ChatRole # Role of the author in the chat, e.g., user, assistant, system, tool.
|
||||
contents: List[AIContent] # List of content types in the message, e.g., text, images, function calls.
|
||||
```
|
||||
|
||||
## Tool types
|
||||
|
||||
Align with the [MEAI tool types](https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.ai.aifunction?view=net-9.0-pp)
|
||||
in terms of the core attributes and methods.
|
||||
|
||||
See [Tools](./tools.md) for more details.
|
||||
|
||||
## Model client types
|
||||
|
||||
Align with the MEAI model client types in terms of the core attributes and methods.
|
||||
|
||||
See [Models](./models.md) for more details.
|
||||
@@ -0,0 +1,38 @@
|
||||
# Vector Stores and Embedding Clients
|
||||
|
||||
A vector store is component that provides a unified interface for
|
||||
interacting with different vector databases, similar to model clients.
|
||||
It exposes indexing and querying methods, including vector, text-based
|
||||
and hybrid queries.
|
||||
|
||||
The details can be filled in based on the existing vector abstraction
|
||||
in Semantic Kernel.
|
||||
|
||||
The framework provides pre-built vector stores (already exist in
|
||||
Semantic Kernel):
|
||||
|
||||
- Azure AI Search
|
||||
- Cosmos DB
|
||||
- Chroma
|
||||
- Couchbase
|
||||
- Elasticsearch
|
||||
- Faiss
|
||||
- In-memory
|
||||
- JDBC
|
||||
- MongoDB
|
||||
- Pinecone
|
||||
- Postgres
|
||||
- Qdrant
|
||||
- Redis
|
||||
- SQL Server
|
||||
- SQLite
|
||||
- Volatile
|
||||
- Weaviate
|
||||
|
||||
Many vector store implementations will require embedding clients
|
||||
to function. An embedding client is a component that implements a unified interface
|
||||
to interact with different embedding models.
|
||||
|
||||
The framework provides a set of pre-built embedding clients:
|
||||
|
||||
- TBD.
|
||||
@@ -0,0 +1,201 @@
|
||||
# Workflow
|
||||
|
||||
The design goal is to create workflows that can be specified in a declarative
|
||||
way to allow for easy creation and modification without needing to change the
|
||||
underlying code.
|
||||
|
||||
## `Workflow` is Agent
|
||||
|
||||
A `Workflow` is an agent composed of other agents. It follows the same interface
|
||||
as an agent. This allows for nested workflows, where a workflow can contain other
|
||||
workflows.
|
||||
|
||||
## Agents in a `Workflow`
|
||||
|
||||
Each agent (or a `Workflow`) in a `Workflow` has a thread on which it will
|
||||
always run. The thread may be privated, or shared among some or all of the agents.
|
||||
|
||||
When do agents share a `thread`?
|
||||
|
||||
- When an agent is called through handoff or as a tool by another agent, the caller
|
||||
agent's thread may be shared with the callee agent.
|
||||
|
||||
When do agents not share a `thread`?
|
||||
|
||||
- When a set of worker agents are called through a "fan-out" and "fan-in" pattern, where the worker
|
||||
agents are called in parallel and the results are combined by an aggregator agent.
|
||||
|
||||
Thread sharing can be configured through the `Workflow`'s constructor.
|
||||
By default, each agent has its own private thread and no sharing.
|
||||
|
||||
See [Threads](threads.md) for more details on how threads work.
|
||||
|
||||
## `Workflow` from control flow graph
|
||||
|
||||
A `Workflow` can be created from a control flow graph of agents.
|
||||
The graph is a directed graph where each node is an agent and each edge
|
||||
is a transition between agents. The graph can contain loops
|
||||
and conditional transitions.
|
||||
|
||||
The control flow graph specifies the order in which agents are called
|
||||
and the conditions under which they are called.
|
||||
|
||||
```python
|
||||
# Create agent instances.
|
||||
agent1 = MCPAgent(
|
||||
model_client="OpenAIChatCompletionClient",
|
||||
mcp_server=["MCPServer1", "MCPServer2"],
|
||||
)
|
||||
agent2 = MCPAgent(
|
||||
model_client="OpenAIChatCompletionClient",
|
||||
mcp_server=["MCPServer3", "MCPServer4"],
|
||||
)
|
||||
agent3 = MCPAgent(
|
||||
model_client="OpenAIChatCompletionClient",
|
||||
mcp_server="MCPServer5",
|
||||
)
|
||||
|
||||
# Create a directed graph of agents with conditional loops and transitions.
|
||||
# The graph builder validates the graph.
|
||||
graph = GraphBuilder() \
|
||||
.add_agent(agent1) \
|
||||
.add_agent(agent2) \
|
||||
.add_agent(agent3) \
|
||||
.add_loop(agent1, agent2, conditions=Any(...)) \
|
||||
.add_transition(agent2, agent3, conditions=Any(..., All(...))]) \
|
||||
.build()
|
||||
|
||||
# Create a workflow from the graph.
|
||||
workflow = Workflow(graph=graph)
|
||||
```
|
||||
|
||||
## `Workflow` from message router
|
||||
|
||||
By default, each message is delivered to an _inbox_ of every agent in a `Workflow`.
|
||||
When an agent is called, the inbox is cleared and the messages are added
|
||||
to the thread that is used by the agent.
|
||||
|
||||
If multiple agents share a thread, each message is added exactly once to the thread.
|
||||
|
||||
To customize the message flow, we can configure how each inbox behaves.
|
||||
Each agent's inbox can be configured to only accept messages from a specific sender(s).
|
||||
We can also configure the inbox batch size, time-to-live for messages in the inbox
|
||||
and various other parameters that controls how the inbox is processed.
|
||||
|
||||
The configuration of agents' inboxes is done using a `Router` object,
|
||||
which can be built using a `RouterBuilder` object.
|
||||
|
||||
```python
|
||||
graph = ...
|
||||
|
||||
router = RouterBuilder() \
|
||||
.add_route(source=agent1, target=agent2) \ # Agent2 will receive messages from agent1.
|
||||
.add_route(source=[agent1, agent2], target=agent3, batch_size=10, ttl="1h") \ # Agent3 will receive messages from agent1 and agent2, with a batch size of 10 and a time-to-live of 1 hour.
|
||||
.add_route(source=Router.ANY, target=agent4) \ # Agent4 will receive all messages.
|
||||
).build()
|
||||
|
||||
# Create a workflow from the graph and router.
|
||||
workflow = Workflow(graph=graph, router=router)
|
||||
```
|
||||
|
||||
You can also skip the graph all together and just create a workflow from the router.
|
||||
In this case, all agents will run concurrently to process the messages delivered
|
||||
to their inboxes, according to the inbox rules.
|
||||
|
||||
```python
|
||||
# Create a workflow from the router.
|
||||
workflow = Workflow(router=router)
|
||||
```
|
||||
|
||||
The validation of the router is done as part of the workflow creation, to ensure
|
||||
that no gap exists in the routing, and warning for cascading routes.
|
||||
|
||||
## Run `Workflow`
|
||||
|
||||
It is the same as running an agent.
|
||||
|
||||
```python
|
||||
# Create a message batch to send to the workflow.
|
||||
# The run context is used to pass in the event channel and other context
|
||||
# shared by the agents.
|
||||
thread = [
|
||||
Message("Hello"),
|
||||
Message("Can you find the file 'foo.txt' for me?"),
|
||||
]
|
||||
context = RunContext(event_channel="console")
|
||||
result = await workflow.run(thread, context=context)
|
||||
```
|
||||
|
||||
## `Workflow` has a final response
|
||||
|
||||
A `Workflow` is expected to have a final response, which is the final response in the
|
||||
result of the last agent in the workflow. The final response is returned as part of the
|
||||
`Result` object returned by the `run` method.
|
||||
|
||||
This is to ensure the workflow can be used in the same way as an agent.
|
||||
|
||||
## Stopping `Workflow`
|
||||
|
||||
A `workflow` may run indefinitely, so it is important to have a way to stop it.
|
||||
|
||||
```python
|
||||
# Use a stopping condition to stop the workflow when the condition is met.
|
||||
# Detail design TBD.
|
||||
condition = StopCondition(
|
||||
condition=Any(...),
|
||||
timeout="1h",
|
||||
)
|
||||
workflow = Workflow(graph=graph, stop_condition=condition)
|
||||
```
|
||||
|
||||
TBD.
|
||||
|
||||
## `Workflow` can be stateless
|
||||
|
||||
The workflow state is kept in the thread object as input to the `run` method.
|
||||
If not provided, the workflow will create new sub-threads for each agent
|
||||
in the workflow for their private threads, otherwise, the workflow will
|
||||
use the provided sub-thread.
|
||||
|
||||
```python
|
||||
# Create a workflow with a graph and router.
|
||||
workflow = Workflow(graph=graph, router=router, stop_condition=condition)
|
||||
|
||||
# Create a new thread.
|
||||
thread = [
|
||||
Message("Hello"),
|
||||
Message("Can you find the file 'foo.txt' for me?"),
|
||||
]
|
||||
|
||||
# Run the workflow.
|
||||
result = await workflow.run(thread, context=context)
|
||||
|
||||
# Update the thread with new messages from the user.
|
||||
thread = result.thread + [
|
||||
Message("Can you find the file 'bar.txt' for me?"),
|
||||
]
|
||||
|
||||
# Resume the workflow from where it left off.
|
||||
result = await workflow.run(thread, context=context)
|
||||
```
|
||||
|
||||
Read more about [Threads](threads.md) for more details on threads.
|
||||
|
||||
## Pre-defined workflows
|
||||
|
||||
The framework ships with a few pre-defined workflows for common orchestration
|
||||
patterns. These workflows can be used as-is or as a starting point for
|
||||
new developers, however, when using them, you should be aware of the underlying
|
||||
implementation and move on to custom workflows when a limit is reached.
|
||||
|
||||
The pre-defined workflows are:
|
||||
- `Sequential`: A sequential workflow that calls each agent in order,
|
||||
its message flow can be configured separately.
|
||||
- `MapReduce`: A map-reduce workflow that splits a task into smaller
|
||||
tasks, runs them in parallel and then combines the results.
|
||||
- `RoundRobinGroupChat`: agents are called in a round-robin fashion in a loop.
|
||||
- `SelectorGroupChat`: agents are selected on each iteration by the workflow's built-in
|
||||
LLM based selector.
|
||||
- `Swarm`: use handoffs.
|
||||
|
||||
The predefined workflows are implemented as subclasses of the `Workflow` class.
|
||||
Reference in New Issue
Block a user