mirror of
https://github.com/earendil-works/pi.git
synced 2026-06-18 15:54:04 +08:00
99dc6fcec8
Fireworks prompt caching is enabled by default (automatic prefix matching), but on serverless infrastructure, requests hit random replicas. Without session affinity, the per-replica cache misses, negating cache hit rates and the discounted cacheRead pricing. Changes: - Add sendSessionAffinityHeaders and supportsCacheControlOnTools to AnthropicMessagesCompat interface - Send x-session-affinity header for Fireworks (and Cloudflare AI Gateway Anthropic) when sessionId is available and caching is enabled - Omit cache_control on tool definitions for Fireworks (unsupported per https://docs.fireworks.ai/tools-sdks/anthropic-compatibility) - Default supportsEagerToolInputStreaming to false for Fireworks (unsupported field) - Default supportsLongCacheRetention to false for Fireworks (cache_control.ttl not supported) - Add compat settings to Fireworks models in generate-models.ts - Update generated models with Fireworks compat settings - Add integration tests for session affinity and tool compat Refs: https://docs.fireworks.ai/guides/prompt-caching Refs: https://docs.fireworks.ai/tools-sdks/anthropic-compatibility
99dc6fcec8
ยท
2026-05-10 00:11:36 +02:00
History