Fireworks prompt caching is enabled by default (automatic prefix matching),
but on serverless infrastructure, requests hit random replicas. Without
session affinity, the per-replica cache misses, negating cache hit rates
and the discounted cacheRead pricing.
Changes:
- Add sendSessionAffinityHeaders and supportsCacheControlOnTools
to AnthropicMessagesCompat interface
- Send x-session-affinity header for Fireworks (and Cloudflare AI
Gateway Anthropic) when sessionId is available and caching is enabled
- Omit cache_control on tool definitions for Fireworks (unsupported
per https://docs.fireworks.ai/tools-sdks/anthropic-compatibility)
- Default supportsEagerToolInputStreaming to false for Fireworks
(unsupported field)
- Default supportsLongCacheRetention to false for Fireworks
(cache_control.ttl not supported)
- Add compat settings to Fireworks models in generate-models.ts
- Update generated models with Fireworks compat settings
- Add integration tests for session affinity and tool compat
Refs: https://docs.fireworks.ai/guides/prompt-caching
Refs: https://docs.fireworks.ai/tools-sdks/anthropic-compatibility