Refactored `applyPayloadConfig` to `applyPayloadConfigWithRoot`, adding support for default rule validation against the original payload when available. Updated all executors to use `applyPayloadConfigWithRoot` and incorporate an optional original request payload for translations.
The previous commit added thinkingLevel support but didn't apply it
when the reasoning effort came from model name suffix (e.g., model(minimal)).
This was because ResolveThinkingConfigFromMetadata returns nil for
level-based models, bypassing the metadata application.
Changes:
- Add ApplyGemini3ThinkingLevelFromMetadata for standard Gemini API
- Add ApplyGemini3ThinkingLevelFromMetadataCLI for CLI API format
- Update gemini_cli_executor to apply Gemini 3 thinkingLevel from metadata
- Update antigravity_executor to apply Gemini 3 thinkingLevel from metadata
- Update aistudio_executor to apply Gemini 3 thinkingLevel from metadata
- Add comprehensive test coverage for Gemini 3 thinkingLevel functions
Per Google's official documentation, Gemini 3 models should use
thinkingLevel (string) instead of thinkingBudget (number) for
optimal performance.
From Google's Gemini Thinking docs:
> Use the thinkingLevel parameter with Gemini 3 models. While
> thinkingBudget is accepted for backwards compatibility, using
> it with Gemini 3 Pro may result in suboptimal performance.
Changes:
- Add model family detection functions (IsGemini3Model, IsGemini25Model,
IsGemini3ProModel, IsGemini3FlashModel)
- Add ApplyGeminiThinkingLevel and ApplyGeminiCLIThinkingLevel functions
for applying thinkingLevel config
- Add ValidateGemini3ThinkingLevel for model-specific level validation
- Add ThinkingBudgetToGemini3Level for backward compatibility conversion
- Update NormalizeGeminiThinkingBudget to convert budget to level for
Gemini 3 models
- Update ApplyDefaultThinkingIfNeeded to not set a default level for
Gemini 3 (lets API use its dynamic default "high")
- Update ConvertThinkingLevelToBudget to preserve thinkingLevel for
Gemini 3 models
- Add Levels field to all Gemini 3 model definitions:
- Gemini 3 Pro: ["low", "high"]
- Gemini 3 Flash: ["minimal", "low", "medium", "high"]
Backward compatibility:
- Gemini 2.5 models continue to use thinkingBudget as before
- If thinkingBudget is provided for Gemini 3, it's converted to the
appropriate thinkingLevel
- Existing configurations continue to work
Expose thinking/effort normalization helpers from the executor package
so conversion tests use production code and stay aligned with runtime
validation behavior.
Add package and constructor documentation for AI Studio, Antigravity,
Gemini CLI, Gemini API, and Vertex executors to describe their roles and
inputs.
Introduce a shared stream scanner buffer constant in the Gemini API
executor and reuse it in Gemini CLI and Vertex streaming code so stream
handling uses a consistent configuration.
Update Refresh implementations for AI Studio, Gemini CLI, Gemini API
(API key), and Vertex executors to short‑circuit and simply return the
incoming auth object, while keeping Antigravity token renewal as the
only executor that performs OAuth refresh.
Remove OAuth2-based token refresh logic and related dependencies from
the Gemini API executor, since it now operates strictly with API key
credentials.
Extract applyThinkingMetadata and applyThinkingMetadataCLI helpers to
payload_helpers.go and use them across all four Gemini-based executors:
- gemini_executor.go (Execute, ExecuteStream, CountTokens)
- gemini_cli_executor.go (Execute, ExecuteStream, CountTokens)
- aistudio_executor.go (translateRequest)
- antigravity_executor.go (Execute, ExecuteStream)
This eliminates code duplication introduced in the -reasoning suffix PR
and centralizes the thinking config application logic.
Net reduction: 28 lines of code.
Remove generationConfig.maxOutputTokens, generationConfig.responseMimeType and generationConfig.responseJsonSchema from the Gemini payload in translateRequest so we no longer send unsupported or conflicting response configuration fields. This lets the backend or caller control response formatting and output limits and helps prevent potential API errors caused by these keys.
**feat(executor): enhance WebSocket error handling and metadata logging**
- Added handling for stream closure before start with appropriate error recording.
- Improved metadata logging for non-OK HTTP status codes in WebSocket responses.
- Consolidated event processing logic with `processEvent` for better error handling and payload management.
- Refactored stream initialization to include the first event handling for smoother execution flow.
**feat(executor): add thinking level to budget conversion utility**
- Introduced `ConvertThinkingLevelToBudget` to map thinking level ("high"/"low") to corresponding budget values.
- Applied the utility in `aistudio_executor.go` before stripping unsupported configs.
- Updated dependencies to include `tidwall/gjson` for JSON parsing.
Introduce `PayloadConfig` in the configuration to define default and override rules for modifying payload parameters. Implement `applyPayloadConfig` and `applyPayloadConfigWithRoot` to apply these rules across various executors, ensuring consistent parameter handling for different models and protocols. Update all relevant executors to utilize this functionality.
- Introduce Server.AttachWebsocketRoute(path, handler) to mount websocket
upgrade handlers on the Gin engine.
- Track registered WS paths via wsRoutes with wsRouteMu to prevent
duplicate registrations; initialize in NewServer and import sync.
- Add Manager.UnregisterExecutor(provider) for clean executor lifecycle
management.
- Add github.com/gorilla/websocket v1.5.3 dependency and update go.sum.
Motivation: enable services to expose WS endpoints through the core server
and allow removing auth executors dynamically while avoiding duplicate
route setup. No breaking changes.