Adds support for the `-reasoning` model name suffix which enables
thinking/reasoning mode with dynamic budget. This allows clients to
request reasoning-enabled inference using model names like
`gemini-2.5-flash-reasoning` without explicit configuration.
The suffix is normalized to the base model (e.g., gemini-2.5-flash)
with thinkingBudget=-1 (dynamic) and include_thoughts=true.
Follows the existing pattern established by -nothinking and
-thinking-N suffixes.
- Updated `usage_helpers.go` to call `EnsureIndex()` for proper index assignment in reporter initialization.
- Adjusted `auth/manager.go` to assign auth indices inside a locked section when they are unassigned, ensuring thread safety and consistency.
AMP CLI requests /threads.rss at the root level, but the AMP module
only registered routes under /api/*. This caused a 404 error during
AMP CLI startup.
Add the missing root-level route with the same security middleware
(noCORS, optional localhost restriction) as other management routes.
- Add model mapping feature to README.md Amp CLI section
- Add detailed Model Mapping Configuration section to amp-cli-integration.md
- Update architecture diagram to show model mapping flow
- Update Model Fallback Behavior to include mapping step
- Add Table of Contents entry for model mapping
- Add AmpModelMapping config to route models like 'claude-opus-4.5' to 'claude-sonnet-4'
- Add ModelMapper interface and DefaultModelMapper implementation with hot-reload support
- Enhance FallbackHandler to apply model mappings before falling back to ampcode.com
- Add structured logging for routing decisions (local provider, mapping, amp credits)
- Update config.example.yaml with amp-model-mappings documentation
- Updated `antigravity_openai_request.go` to process non-JSON outputs gracefully by verifying and distinguishing between JSON and plain string formats.
- Ensured proper assignment of parsed or raw response to `functionResponse`.
- Added `ContextLength` field with a value of 200,000 to all applicable Claude model definitions.
- Standardized `MaxCompletionTokens` values across models for consistency and alignment.
**fix(translator): add support for "xhigh" reasoning effort in OpenAI responses**
- Updated handling in `openai_openai-responses_request.go` to include the new "xhigh" reasoning effort level.
Fixes an issue where Claude thinking models would return 400 errors when
the thinking.budget_tokens was greater than or equal to max_tokens.
Changes:
- Add MaxCompletionTokens: 128000 to all Claude thinking model definitions
- Add ensureMaxTokensForThinking() function in claude_executor.go that:
- Checks if thinking is enabled with a budget_tokens value
- Looks up the model's MaxCompletionTokens from the registry
- Ensures max_tokens is set to at least the model's MaxCompletionTokens
- Falls back to budget_tokens + 4000 buffer if registry lookup fails
This ensures Anthropic API constraint (max_tokens > thinking.budget_tokens)
is always satisfied when using extended thinking features.
Fixes: #339🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Implemented logic to pair consecutive function calls and their outputs, ensuring proper sequencing for processing.
- Adjusted `gemini_openai-responses_request.go` to normalize message structures and maintain expected flow.
- Updated handling of `output` in `gemini_openai-responses_request.go` to use `.Str` instead of `.Raw` when parsing non-JSON string outputs.
- Added checks to distinguish between JSON and non-JSON `output` types for accurate `functionResponse` construction.
- Updated handling of `output` in `gemini_openai-responses_request.go` to use `.Raw` instead of `.String` for preserving original JSON encoding.
- Ensured proper setting of raw JSON output when constructing `functionResponse`.
- Updated handling of `thoughtSignature` across all translator modules to retain other content payloads if present.
- Adjusted logic for `thought_signature` and `inline_data` keys for consistent processing.
- Added new `Gemini 3 Pro Image Preview` model with detailed metadata and configuration.
- Removed outdated `Claude Sonnet 4.5 Thinking` model definition for cleanup and relevance.
**feat(handlers, executor): add Gemini 3 Pro Preview support and refine Claude system instructions**
- Added support for the new "Gemini 3 Pro Preview" action in Gemini handlers, including detailed metadata and configuration.
- Removed redundant `cache_control` field from Claude system instructions for cleaner payload structure.
**fix(executor): replace redundant commented code with `checkSystemInstructions` helper**
- Replaced commented-out `sjson.SetRawBytes` lines with the new `checkSystemInstructions` function.
- Centralized system instruction handling for better code clarity and reuse.
- Ensured consistent logic for managing `system` field across Claude executor flows.
- Commented out multiple instances of `sjson.SetRawBytes` for setting `system` key to Claude instructions as they are redundant.
- Code cleanup to improve clarity and maintainability without affecting functionality.
- Add Claude thinking model definitions (sonnet-4-5-thinking, opus-4-5-thinking variants)
- Add Thinking support for antigravity models with -thinking suffix
- Add injectThinkingConfig() for automatic thinking budget based on model suffix
- Add resolveUpstreamModel() mappings for thinking variants to actual Claude models
- Add extractAndRemoveBetas() to convert betas array to anthropic-beta header
- Update applyClaudeHeaders() to merge custom betas from request body
Closes#324
- Introduced `appendAPIResponse` helper to preserve and append data to existing API responses.
- Ensured newline inclusion when appending, if necessary.
- Improved `nil` and data type checks for response handling.
- Updated middleware to skip request logging for `GET` requests.
- Added additional metadata fields (`Name`, `Description`, `DisplayName`, `Version`) to `ModelInfo` struct initialization for better model representation.
- Removed unnecessary whitespace in the code.
- Ensured `functionDeclarations` key renaming only occurs if the key exists in Gemini tools processing.
- Prevented unnecessary JSON reassignment when the target key is absent.
- Fixed parameter key renaming to correctly handle `functionDeclarations` and `parametersJsonSchema` in Gemini tools.
- Resolved potential overwriting issue by reassigning JSON strings after each key rename.
- Introduced `TLSConfig` to support HTTPS configurations, including enabling TLS, specifying certificate and key files.
- Updated HTTP server logic to handle HTTPS mode when TLS is enabled.
- Enhanced `config.example.yaml` with TLS settings example.
- Adjusted internal URL generation to respect protocol based on TLS state.
- Fixed improper handling of `indexAssigned` and `Index` during auth reassignment.
- Ensured `EnsureIndex` is invoked after validating existing auth entries.
**feat(retry): add configurable retry logic with cooldown support**
- Introduced `max-retry-interval` configuration for cooldown durations between retries.
- Added `SetRetryConfig` in `Manager` to handle retry attempts and cooldown intervals.
- Enhanced provider execution logic to include retry attempts, cooldown management, and dynamic wait periods.
- Updated API endpoints and YAML configuration to support `max-retry-interval`.
- Introduced `logOnErrorOnly` mode to enable logging only for error responses when request logging is disabled.
- Added endpoints to list and download error logs (`/request-error-logs`).
- Implemented error log file cleanup to retain only the newest 10 logs.
- Refactored `ResponseWriterWrapper` to support forced logging for error responses.
- Enhanced middleware to capture data for upstream error persistence.
- Improved log file naming and error log filename generation.
- Restored `thoughtSignature` validator bypass for model-specific parts in Gemini content processing.
- Removed redundant logic from the `executor` for cleaner handling.
Remove generationConfig.maxOutputTokens, generationConfig.responseMimeType and generationConfig.responseJsonSchema from the Gemini payload in translateRequest so we no longer send unsupported or conflicting response configuration fields. This lets the backend or caller control response formatting and output limits and helps prevent potential API errors caused by these keys.