Fireworks prompt caching is enabled by default (automatic prefix matching),
but on serverless infrastructure, requests hit random replicas. Without
session affinity, the per-replica cache misses, negating cache hit rates
and the discounted cacheRead pricing.
Changes:
- Add sendSessionAffinityHeaders and supportsCacheControlOnTools
to AnthropicMessagesCompat interface
- Send x-session-affinity header for Fireworks (and Cloudflare AI
Gateway Anthropic) when sessionId is available and caching is enabled
- Omit cache_control on tool definitions for Fireworks (unsupported
per https://docs.fireworks.ai/tools-sdks/anthropic-compatibility)
- Default supportsEagerToolInputStreaming to false for Fireworks
(unsupported field)
- Default supportsLongCacheRetention to false for Fireworks
(cache_control.ttl not supported)
- Add compat settings to Fireworks models in generate-models.ts
- Update generated models with Fireworks compat settings
- Add integration tests for session affinity and tool compat
Refs: https://docs.fireworks.ai/guides/prompt-caching
Refs: https://docs.fireworks.ai/tools-sdks/anthropic-compatibility
Built-in `xiaomi` provider now targets the API billing endpoint (https://api.xiaomimimo.com/anthropic) — a single stable URL for keys issued at platform.xiaomimimo.com. The Token Plan endpoints are exposed as three sibling providers, each with its own env var:
- xiaomi-token-plan-cn: XIAOMI_TOKEN_PLAN_CN_API_KEY
- xiaomi-token-plan-ams: XIAOMI_TOKEN_PLAN_AMS_API_KEY
- xiaomi-token-plan-sgp: XIAOMI_TOKEN_PLAN_SGP_API_KEY
BREAKING CHANGE: users who previously set XIAOMI_API_KEY against the Token Plan AMS endpoint must move to xiaomi-token-plan-ams and set XIAOMI_TOKEN_PLAN_AMS_API_KEY. This also resolves the 401 reported by on #4005, where a platform.xiaomimimo.com key fails against the Token Plan endpoint.
closes#4082
* feat(ai): add Cloudflare AI Gateway as a provider
Routes through Cloudflare's Unified API (`/compat`) for Workers AI and
Anthropic models, and through the provider-specific `/openai` subpath
for OpenAI models so reasoning models (gpt-5.x, o-series) can hit
`/v1/responses` natively. Once `/compat` adds Responses-API support,
the OpenAI subpath can be folded back in.
Catalog layout:
workers-ai/@cf/... -> openai-completions, gateway/.../compat
anthropic/... -> openai-completions, gateway/.../compat
<native-id> -> openai-responses, gateway/.../openai
(gpt-5.1, claude-... no, sorry: gpt-5.x and o-series only;
prefix stripped because the OpenAI SDK posts native ids)
Touches:
packages/ai/src/types.ts add cloudflare-ai-gateway to KnownProvider
packages/ai/src/env-api-keys.ts map to CLOUDFLARE_API_KEY
packages/ai/src/providers/cloudflare.ts add CLOUDFLARE_AI_GATEWAY_COMPAT_BASE_URL
and CLOUDFLARE_AI_GATEWAY_OPENAI_BASE_URL
packages/ai/src/providers/openai-responses.ts one-line dispatch through resolveCloudflareBaseUrl
(matches what openai-completions.ts already does)
packages/ai/scripts/generate-models.ts branch openai/* vs workers-ai/anthropic/*
packages/ai/src/models.generated.ts spliced 34 entries
packages/ai/test/stream.test.ts 3 e2e blocks (one per upstream)
packages/coding-agent/* defaultModelPerProvider, login, env docs,
README, providers.md
Verified end-to-end against a real Cloudflare account with unified
billing: 9/9 e2e tests pass across all three upstreams (Workers AI
Kimi K2.6, OpenAI gpt-5.1 reasoning, Anthropic claude-sonnet-4-5).
* refactor(ai): move AI Gateway User-Agent and per-route session-affinity flag to catalog
Mirrors the same per-model metadata refactor done for Workers AI in the
parent branch. All cloudflare-ai-gateway entries get the User-Agent
header. Only workers-ai/* gateway entries set
`compat.sendSessionAffinityHeaders: true` because the gateway
forwards that header to the underlying Workers AI runtime; anthropic/*
upstream and openai/* (openai-responses) don't use it.
packages/ai/scripts/generate-models.ts: emit headers (always) and
per-upstream compat (workers-ai only) on each cloudflare-ai-gateway
entry.
packages/ai/src/models.generated.ts: re-spliced 35 entries with
headers + conditional compat.
Behavior unchanged - 9/9 e2e tests pass across all three upstream
families.
* fix(ai): align AI Gateway with telemetry-aware UA helper
Adapts to badlogic/pi-mono#3851's follow-up fix ("honor telemetry for
Cloudflare attribution headers", fbb5eed) which moved the
'User-Agent: pi-coding-agent' header out of per-model catalog metadata
and into a centralized telemetry-honoring helper
(coding-agent/src/core/sdk.ts:getAttributionHeaders).
- packages/coding-agent/src/core/sdk.ts: extend the cloudflare branch of
getAttributionHeaders to also match cloudflare-ai-gateway and
gateway.ai.cloudflare.com.
- packages/ai/scripts/generate-models.ts and src/models.generated.ts:
drop 'headers' from the 35 cloudflare-ai-gateway entries (constant
CLOUDFLARE_STATIC_HEADERS no longer exists). Per-route
compat.sendSessionAffinityHeaders is unchanged.
End-to-end behavior unchanged: 9/9 tests still pass across all three
upstream families (Workers AI, Anthropic, OpenAI Responses).
---------
Co-authored-by: Mario Zechner <badlogicgames@gmail.com>
Adds `AssistantMessage.responseModel` on the openai-completions path:
surfaces the concrete `chunk.model` when it differs from the requested
id (e.g. OpenRouter `auto` -> `anthropic/...`).