* feat: ModelClient and content types * refactor: Pythonify ChatResponseFormat and ChatRole * feat: Add guardrail interfaces * refactor: Remove CancellationToken * feat: Solidify the Usage APIs * Adds well-known keys for additional_counts, and guidance for how to avoid collisions between providers * Implement sum-aggregation for usage * refactor: Move AITool out of model_client * refactor: Copy editing * fix: CI checks (pyupgrade, ruff, etc.) * ci: Fix pre-commit to use pyright in uv venv The existing pyright precommit hook inside of python-pyright is no longer being maintained by the owner (see https://github.com/RobertCraigie/pyright-python/issues/265) The fix is to define the hook ourselves, relying on `uv run` to drive it. In order for that to work right we need to use the "system" language to break out of the sandbox. * fix: Pyright error fixes * docs: Update models and types design docs * Python: Refinement of content types and model client (#112) * refinement of structure and buildup with ports from semantigen * refined the data and uri contents * refined chat response and updates * moved things and added tests * moved out of src folder * fixed imports and tests * small tweaks * missing build system * upgrade * add mypy * fixed typing for types * fix tests * fixed tool * disable json checks on vscode * remove print --------- Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com> Co-authored-by: eavanvalkenburg <github@vanvalkenburg.eu>
13 KiB
Model Clients
A model client is a component that implements a unified interface for interacting with different language models. It exposes a standardized metadata about the model it provides (e.g., model name, tool call and vision capabilities, etc.) to support validation and composition with other components.
The framework provides a set of pre-built model clients:
OpenAIChatCompletionClientAzureOpenAIChatCompletionClientAzureOpenAIResponseClientAzureAIClientAnthropicClientGeminiClientHuggingFaceClientOllamaClientVLLMClientONNXRuntimeClientBedrockClientNIMClient
Prompt template is a component that is used by model clients to generate prompts with parameters set based on some injected context. This gets into the actual interface and implementation detail of model clients, so we just mention it here.
The design goal is to provide integration with a wide range of model providers, including both open-source and commercial models, while maintaining a consistent interface for developers to use.
ModelClient Protocol (draft)
class ModelClient(Protocol):
"""A protocol for a model client that can generate chat responses."""
async def generate_response(
self,
messages: Sequence[ChatMessage],
**options, # kwargs?
) -> ChatResponse:
"""Sends chat messages and returns the response.
Args:
messages: The sequence of chat messages to send.
**options: Additional options for the chat request, such as model_id, temperature, etc.
See `ChatOptions` for more details.
Returns:
The response messages generated by the client.
Raises:
ValueError: If the input message sequence is `None`.
"""
...
async def generate_streaming_response(
self,
messages: Sequence[ChatMessage],
**options, # kwargs?
) -> AsyncIterable[ChatResponseUpdate]:
"""Sends chat messages and streams the response.
Args:
messages: The sequence of chat messages to send.
**options: Additional options for the chat request, such as model_id, temperature, etc.
See `ChatOptions` for more details.
Returns:
An async iterable of chat response updates containing the content of the response messages
generated by the client.
Raises:
ValueError: If the input message sequence is `None`.
"""
...
def add_input_guardrails(
self,
guardrails: list[InputGuardrail[ChatMessage]]
) -> None:
"""Add input guardrails to the model client.
Args:
guardrails: The list of input guardrails to add.
"""
...
def add_output_guardrails(
self,
guardrails: list[OutputGuardrail[ChatResponse | Sequence[ChatResponseUpdate]]]
) -> None:
"""Add output guardrails to the model client.
Args:
guardrails: The list of output guardrails to add.
"""
...
ChatMessage/ and ChatResponse/StructuredChatResponse/ChatResponseUpdate Types
For the data types used to represent message and response contents, see docs/design/types.md.
class ChatMessage(BaseModel):
author_name: str | None # The name of the author of the message
contents: list[AIContent] # The contents of the message, which can include text, images, function calls, etc.
message_id: str | None # The ID of the message
raw_representation: Any | None = None # The raw representation of the chat message from an underlying implementation
role: ChatRole # The role of the author of the message
additional_properties: dict[str, Any] | None = None
class ChatRole(BaseModel):
"""Describes the intended purpose of a message within a chat interaction."""
value: str
SYSTEM: ClassVar[Self] # The role that instructs or sets the behaviour of the AI system.
USER: ClassVar[Self] # The role that provides user input for chat interactions.
ASSISTANT: ClassVar[Self] # The role that provides responses to system-instructed, user-prompted input.
TOOL: ClassVar[Self] # The role that provides additional information and references in response to tool use requests.
ChatRole is an enum-like class that defines the roles of the author in a chat message. We use a class with class variables to reduce the coupling between versions of the framework and the capabilities available in the various providers. This allows us to work with providers that support extra roles without having to change the framework code. (See also ChatFinishReason.)
The response types are designed to support both structured and unstructured responses, allowing for streaming in the unstructured case.
class ChatResponse(BaseModel):
"""Represents the response to a chat request."""
messages: list[ChatMessage] # The chat response messages.
additional_properties: dict[str, Any] | None = None # Any additional properties associated with the chat response.
conversation_id: str | None = None # An identifier for the state of the conversation.
created_at: CreatedAtT | None = None # A timestamp for the chat response. (TODO: Use a datetime type?)
finish_reason: ChatFinishReason | None = None # The reason for the chat response.
model_id: str | None = None #The model ID used in the creation of the chat response.
raw_representation: Any | None = None # The raw representation of the chat response from an underlying implementation.
response_id: str | None = None # The ID of the chat response.
usage_details: UsageDetails | None = None # The usage details for the chat response.
TValue = TypeVar("TValue", bound=pydantic.BaseModel) # The value type needs to be deserializable (so we rely on Pydantic)
class StructuredResponse(GenericModel, Generic[TValue], ChatResponse):
"""Represents a structured response to a chat request."""
value: TValue
class ChatFinishReason(BaseModel):
"""Represents the reason a chat response completed."""
value: str
CONTENT_FILTER: ClassVar[Self] # type: ignore[assignment]
"""A ChatFinishReason representing the model filtering content, whether for safety, prohibited content,
sensitive content, or other such issues."""
LENGTH: ClassVar[Self] # type: ignore[assignment]
"""A ChatFinishReason representing the model reaching the maximum length allowed for the request and/or
response (typically in terms of tokens)."""
STOP: ClassVar[Self] # type: ignore[assignment]
"""A ChatFinishReason representing the model encountering a natural stop point or provided stop sequence."""
TOOL_CALLS: ClassVar[Self] # type: ignore[assignment]
"""A ChatFinishReason representing the model requesting the use of a tool that was defined in the request."""
For ease of use, all of ChatMessage, ChatResponse, StructuredResponse (and ChatResponseUpdate) provide a helper method to extract the text content. Note that all it does is concatenate any TextContent instances found in the contents. It will ignore any other content types, including TextReasoningContent.
@property
def text(self) -> str:
"""Returns the concatenated text of all messages in the response."""
return " ".join(content.text for content in self.contents if isinstance(content, TextContent))
The streaming response type is designed to represent a "differential" of a full response object, in that the relationship between the two is akin to: Join(ChatResponseUpdate) === ChatResponse.
class ChatResponseUpdate(BaseModel):
"""Represents a single streaming response chunk from a `ModelClient`."""
contents: list[AIContent]
"""The chat response update content items."""
additional_properties: dict[str, Any] | None = None
"""Any additional properties associated with the chat response update."""
author_name: str | None = None
"""The name of the author of the response update."""
conversation_id: str | None = None
"""An identifier for the state of the conversation of which this update is a part."""
created_at: CreatedAtT | None = None # use a datetimeoffset type?
"""A timestamp for the chat response update."""
finish_reason: ChatFinishReason | None = None
"""The finish reason for the operation."""
message_id: str | None = None
"""The ID of the message of which this update is a part."""
model_id: str | None = None
"""The model ID associated with this response update."""
raw_representation: Any | None = None
"""The raw representation of the chat response update from an underlying implementation."""
response_id: str | None = None
"""The ID of the response of which this update is a part."""
role: ChatRole | None = None
"""The role of the author of the response update."""
StreamingResponseUpdate provides helpers to join a set of updates into a single ChatResponse object, as well as to create the next update in a stream given the content of the new update.
ModelOptions: Configuring a Model Client request
Although the ModelClient protocol uses named arguments to pass options to the generate_response and generate_streaming_response methods, we also provide a ModelOptions class representing the total set of configurable options that the user could reasonably expect on a ModelClient implementation. It also provides a convenient way to document them in one place.
class ChatOptions(TypedDict, total=False):
"""Represents the options for a chat request.
Remarks:
This class is here for the purposes of documentation and ease of use. Options should still
be passed as keyword arguments to the `ModelClient.generate_response` and
`ModelClient.generate_streaming_response` methods.
"""
allow_multiple_tool_calls: bool | None = None
"""Indicates whether a single response is allowed to include multiple tool calls. If `False`,
the `ModelClient` is asked to return a maximum of one tool call per request. If `True`, there is
no limit. If `None`, the provider may select its own default."""
conversation_id: str | None = None
"""An optional identifier used to associate a request with an existing conversation."""
frequency_penalty: float | None = None
"""A penalty for repeated tokens in chat responses proportional to how many times they've appeared."""
max_output_tokens: int | None = None
"""The maximum number of tokens in the generated chat response."""
model_id: str | None = None
"""The model ID for the chat request."""
presence_penalty: float | None = None
"""a value that influences the probability of generated tokens appearing based on their existing
presence in generated text."""
response_format: ChatResponseFormat | None = None
"""The response format for the chat request."""
seed: int | None = None
"""A seed value used by a service to control the reproducibility of results."""
stop_sequences: list[str] | None = None
"""The list of stop sequences."""
temperature: float | None = None
"""The temperature for generating chat responses."""
tool_mode: ChatToolMode | None = None
"""The tool mode for the chat request."""
tools: list[AITool] | None = None
"""The list of tools to include with a chat request."""
top_k: int | None = None
"""The number of most probable tokens that the model considers when generating the next part of
the text."""
top_p: float | None = None
"""The 'nucleus sampling' factor (or "top p") for generating chat responses."""
ChatResponseFormat and ChatToolMode are used to configure the output format (structured or not) and tool use mode.
class ChatResponseFormatJson(BaseModel):
"""Represents a response format for structured JSON data."""
type: Literal["json"] = "json"
schema_name: str | None = None
"""The name of the schema."""
schema_description: str | None = None
"""The description of the schema."""
schema_: dict[str, Any] | None = Field(default=None, alias="schema")
"""The JSON schema associated with the response, or `None` if there is none."""
class ChatResponseFormatText(BaseModel):
"""Represents a response format with no constraints around the format."""
type: Literal["text"] = "text"
ChatResponseFormat = ChatResponseFormatJson | ChatResponseFormatText
class AutoChatToolMode(BaseModel):
"""Indicates that a `ModelClient` is free to select any of the available tools, or none at all."""
value: Literal["auto"] = "auto"
class NoneChatToolMode(BaseModel):
"""Indicates that a `ModelClient` should not request the invocation of any tools."""
value: Literal["none"] = "none"
class RequiredChatToolMode(BaseModel):
"""Represents a mode where a chat tool must be called.
This class can optionally nominate a specific function or indicate that any of the functions can be
selected.
"""
value: Literal["require_any", "require_specific"] = "require_any"
required_function_name: str | None = None
"""The name of a specific function that must be called."""
ChatToolMode = AutoChatToolMode | NoneChatToolMode | RequiredChatToolMode