This commit introduces several improvements to the AMP (Advanced Model Proxy) module:
- **Model Mapping Logic:** The `FallbackHandler` now uses a more robust approach for model mapping. It includes the extraction and preservation of dynamic "thinking suffixes" (e.g., `(xhigh)`) during mapping, ensuring that these configurations are correctly applied to the mapped model. A new `resolveMappedModel` function centralizes this logic for cleaner code.
- **ModelMapper Verification:** The `ModelMapper` in `model_mapping.go` now verifies that the target model of a mapping has available providers *after* normalizing it. This prevents mappings to non-existent or unresolvable models.
- **Gemini Thinking Configuration Cleanup:** In `gemini_thinking.go`, unnecessary `generationConfig.thinkingConfig.include_thoughts` and `generationConfig.thinkingConfig.thinkingBudget` fields are now deleted from the request body when applying Gemini thinking levels. This prevents potential conflicts or redundant configurations.
- **Testing:** A new test case `TestModelMapper_MapModel_TargetWithThinkingSuffix` has been added to `model_mapping_test.go` to specifically cover the preservation of thinking suffixes during model mapping.
The previous commit added thinkingLevel support but didn't apply it
when the reasoning effort came from model name suffix (e.g., model(minimal)).
This was because ResolveThinkingConfigFromMetadata returns nil for
level-based models, bypassing the metadata application.
Changes:
- Add ApplyGemini3ThinkingLevelFromMetadata for standard Gemini API
- Add ApplyGemini3ThinkingLevelFromMetadataCLI for CLI API format
- Update gemini_cli_executor to apply Gemini 3 thinkingLevel from metadata
- Update antigravity_executor to apply Gemini 3 thinkingLevel from metadata
- Update aistudio_executor to apply Gemini 3 thinkingLevel from metadata
- Add comprehensive test coverage for Gemini 3 thinkingLevel functions
Per Google's official documentation, Gemini 3 models should use
thinkingLevel (string) instead of thinkingBudget (number) for
optimal performance.
From Google's Gemini Thinking docs:
> Use the thinkingLevel parameter with Gemini 3 models. While
> thinkingBudget is accepted for backwards compatibility, using
> it with Gemini 3 Pro may result in suboptimal performance.
Changes:
- Add model family detection functions (IsGemini3Model, IsGemini25Model,
IsGemini3ProModel, IsGemini3FlashModel)
- Add ApplyGeminiThinkingLevel and ApplyGeminiCLIThinkingLevel functions
for applying thinkingLevel config
- Add ValidateGemini3ThinkingLevel for model-specific level validation
- Add ThinkingBudgetToGemini3Level for backward compatibility conversion
- Update NormalizeGeminiThinkingBudget to convert budget to level for
Gemini 3 models
- Update ApplyDefaultThinkingIfNeeded to not set a default level for
Gemini 3 (lets API use its dynamic default "high")
- Update ConvertThinkingLevelToBudget to preserve thinkingLevel for
Gemini 3 models
- Add Levels field to all Gemini 3 model definitions:
- Gemini 3 Pro: ["low", "high"]
- Gemini 3 Flash: ["minimal", "low", "medium", "high"]
Backward compatibility:
- Gemini 2.5 models continue to use thinkingBudget as before
- If thinkingBudget is provided for Gemini 3, it's converted to the
appropriate thinkingLevel
- Existing configurations continue to work
Route OpenAI reasoning effort through ThinkingEffortToBudget for Claude
translators, preserve "minimal" when translating OpenAI Responses, and
treat blank/unknown efforts as no-ops for Gemini thinking configs.
Also map budget -1 to "auto" and expand cross-protocol thinking tests.
Move OpenAI `reasoning_effort` -> Gemini `thinkingConfig` budget logic into
shared helpers used by Gemini, Gemini CLI, and antigravity translators.
Normalize Claude thinking handling by preferring positive budgets, applying
budget token normalization, and gating by model support.
Always convert Gemini `thinkingBudget` back to OpenAI `reasoning_effort` to
support allowCompat models, and update tests for normalization behavior.
Ensure thinking settings translate correctly across providers:
- Only apply reasoning_effort to level-based models and derive it from numeric
budget suffixes when present
- Strip effort string fields for budget-based models and skip Claude/Gemini
budget resolution for level-based or unsupported models
- Default Gemini include_thoughts when a nonzero budget override is set
- Add cross-protocol conversion and budget range tests
- Added support for parsing and normalizing dynamic thinking model suffixes.
- Centralized budget resolution across executors and payload helpers.
- Retired legacy Gemini-specific thinking handlers in favor of unified logic.
- Updated executors to use metadata-based thinking configuration.
- Added `ResolveOriginalModel` utility for resolving normalized upstream models using request metadata.
- Updated executors (Gemini, Codex, iFlow, OpenAI, Qwen) to incorporate upstream model resolution and substitute model values in payloads and request URLs.
- Ensured fallbacks handle cases with missing or malformed metadata to derive models robustly.
- Refactored upstream model resolution to dynamically incorporate metadata for selecting and normalizing models.
- Improved handling of thinking configurations and model overrides in executors.
- Removed hardcoded thinking model entries and migrated logic to metadata-based resolution.
- Updated payload mutations to always include the resolved model.
Adds support for the `-reasoning` model name suffix which enables
thinking/reasoning mode with dynamic budget. This allows clients to
request reasoning-enabled inference using model names like
`gemini-2.5-flash-reasoning` without explicit configuration.
The suffix is normalized to the base model (e.g., gemini-2.5-flash)
with thinkingBudget=-1 (dynamic) and include_thoughts=true.
Follows the existing pattern established by -nothinking and
-thinking-N suffixes.
**feat(executor): add thinking level to budget conversion utility**
- Introduced `ConvertThinkingLevelToBudget` to map thinking level ("high"/"low") to corresponding budget values.
- Applied the utility in `aistudio_executor.go` before stripping unsupported configs.
- Updated dependencies to include `tidwall/gjson` for JSON parsing.
- add fallback handler that forwards Amp provider requests to ampcode.com when the provider isn’t configured locally
- wrap AMP provider routes with the fallback so requests always have a handler
- share Gemini thinking model normalization helper between core handlers and AMP fallback
feat(gemini): add Gemini thinking configuration support and metadata normalization
- Introduced logic to parse and apply `thinkingBudget` and `include_thoughts` configurations from metadata.
- Enhanced request handling to include normalized Gemini model metadata, preserving the original model identifier.
- Updated Gemini and Gemini-CLI executors to apply thinking configuration based on metadata overrides.
- Refactored handlers to support metadata extraction and cloning during request preparation.