Mercury 2 in instant mode (reasoning_effort: "none") disables tool calling.
The openai-completions provider hardcodes {reasoning:{effort:"none"}} when no
explicit reasoning level is passed and thinkingLevelMap.off isn't null
(openai-completions.ts:575), so every caller that doesn't opt in to a level
silently breaks Mercury 2's agentic use cases.
Setting thinkingLevelMap.off = null on the Mercury 2 catalog entry causes the
provider to omit the reasoning param entirely, letting Mercury 2's own default
take over. Low/medium/high pass through verbatim; OpenRouter normalizes them
to Mercury's vocabulary.
Prefix-matched on "inception/mercury-2" so future Mercury 2 variants on
OpenRouter inherit the fix.
Fireworks prompt caching is enabled by default (automatic prefix matching),
but on serverless infrastructure, requests hit random replicas. Without
session affinity, the per-replica cache misses, negating cache hit rates
and the discounted cacheRead pricing.
Changes:
- Add sendSessionAffinityHeaders and supportsCacheControlOnTools
to AnthropicMessagesCompat interface
- Send x-session-affinity header for Fireworks (and Cloudflare AI
Gateway Anthropic) when sessionId is available and caching is enabled
- Omit cache_control on tool definitions for Fireworks (unsupported
per https://docs.fireworks.ai/tools-sdks/anthropic-compatibility)
- Default supportsEagerToolInputStreaming to false for Fireworks
(unsupported field)
- Default supportsLongCacheRetention to false for Fireworks
(cache_control.ttl not supported)
- Add compat settings to Fireworks models in generate-models.ts
- Update generated models with Fireworks compat settings
- Add integration tests for session affinity and tool compat
Refs: https://docs.fireworks.ai/guides/prompt-caching
Refs: https://docs.fireworks.ai/tools-sdks/anthropic-compatibility