Commit Graph

143 Commits

Author SHA1 Message Date
Luis Pater
0fd2abbc3b **refactor(cliproxy, config): remove vertex-compat flow, streamline Vertex API key handling**
- Removed `vertex-compat` executor and related configuration.
- Consolidated Vertex compatibility checks into `vertex` handling with `apikey`-based model resolution.
- Streamlined model generation logic for Vertex API key entries.
2025-12-02 09:18:24 +08:00
Aero
0ebb654019 feat: Add support for VertexAI compatible service (#375)
feat: consolidate Vertex AI compatibility with API key support in Gemini
2025-12-02 08:14:22 +08:00
auroraflux
1c6f4be8ae refactor(executor): dedupe thinking metadata helpers across Gemini executors
Extract applyThinkingMetadata and applyThinkingMetadataCLI helpers to
payload_helpers.go and use them across all four Gemini-based executors:
- gemini_executor.go (Execute, ExecuteStream, CountTokens)
- gemini_cli_executor.go (Execute, ExecuteStream, CountTokens)
- aistudio_executor.go (translateRequest)
- antigravity_executor.go (Execute, ExecuteStream)

This eliminates code duplication introduced in the -reasoning suffix PR
and centralizes the thinking config application logic.

Net reduction: 28 lines of code.
2025-11-30 15:20:15 -08:00
Luis Pater
73208c4e55 Merge pull request #376 from auroraflux/feat/reasoning-suffix-support
feat(util): add -reasoning suffix support for Gemini models
2025-11-30 20:55:38 +08:00
auroraflux
32d3809f8c **feat(util): add -reasoning suffix support for Gemini models**
Adds support for the `-reasoning` model name suffix which enables
thinking/reasoning mode with dynamic budget. This allows clients to
request reasoning-enabled inference using model names like
`gemini-2.5-flash-reasoning` without explicit configuration.

The suffix is normalized to the base model (e.g., gemini-2.5-flash)
with thinkingBudget=-1 (dynamic) and include_thoughts=true.

Follows the existing pattern established by -nothinking and
-thinking-N suffixes.
2025-11-30 01:18:57 -08:00
Luis Pater
a748e93fd9 **fix(executor, auth): ensure index assignment consistency for auth objects**
- Updated `usage_helpers.go` to call `EnsureIndex()` for proper index assignment in reporter initialization.
- Adjusted `auth/manager.go` to assign auth indices inside a locked section when they are unassigned, ensuring thread safety and consistency.
2025-11-30 16:56:29 +08:00
nestharus
e73cdf5cff fix(claude): ensure max_tokens exceeds thinking budget for thinking models
Fixes an issue where Claude thinking models would return 400 errors when
the thinking.budget_tokens was greater than or equal to max_tokens.

Changes:
- Add MaxCompletionTokens: 128000 to all Claude thinking model definitions
- Add ensureMaxTokensForThinking() function in claude_executor.go that:
  - Checks if thinking is enabled with a budget_tokens value
  - Looks up the model's MaxCompletionTokens from the registry
  - Ensures max_tokens is set to at least the model's MaxCompletionTokens
  - Falls back to budget_tokens + 4000 buffer if registry lookup fails

This ensures Anthropic API constraint (max_tokens > thinking.budget_tokens)
is always satisfied when using extended thinking features.

Fixes: #339

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 22:31:05 -08:00
Luis Pater
a4a26d978e Fixed: #339
**feat(handlers, executor): add Gemini 3 Pro Preview support and refine Claude system instructions**

- Added support for the new "Gemini 3 Pro Preview" action in Gemini handlers, including detailed metadata and configuration.
- Removed redundant `cache_control` field from Claude system instructions for cleaner payload structure.
2025-11-26 11:42:57 +08:00
Luis Pater
ed9f6e897e Fixed: #337
**fix(executor): replace redundant commented code with `checkSystemInstructions` helper**

- Replaced commented-out `sjson.SetRawBytes` lines with the new `checkSystemInstructions` function.
- Centralized system instruction handling for better code clarity and reuse.
- Ensured consistent logic for managing `system` field across Claude executor flows.
2025-11-26 08:27:48 +08:00
Luis Pater
2e5681ea32 Merge branch 'dev' into feat/claude-thinking-and-beta-headers 2025-11-26 02:16:40 +08:00
Luis Pater
52c17f03a5 **fix(executor): comment out redundant code for setting Claude system instructions**
- Commented out multiple instances of `sjson.SetRawBytes` for setting `system` key to Claude instructions as they are redundant.
- Code cleanup to improve clarity and maintainability without affecting functionality.
2025-11-26 02:06:16 +08:00
nestharus
d0e694d4ed feat(claude): add thinking model variants and beta headers support
- Add Claude thinking model definitions (sonnet-4-5-thinking, opus-4-5-thinking variants)
- Add Thinking support for antigravity models with -thinking suffix
- Add injectThinkingConfig() for automatic thinking budget based on model suffix
- Add resolveUpstreamModel() mappings for thinking variants to actual Claude models
- Add extractAndRemoveBetas() to convert betas array to anthropic-beta header
- Update applyClaudeHeaders() to merge custom betas from request body

Closes #324
2025-11-25 03:33:05 -08:00
Luis Pater
113db3c5bf **fix(executor): update antigravity executor to enhance model metadata handling**
- Added additional metadata fields (`Name`, `Description`, `DisplayName`, `Version`) to `ModelInfo` struct initialization for better model representation.
- Removed unnecessary whitespace in the code.
2025-11-25 09:19:01 +08:00
Luis Pater
ddb0c0ec1c **fix(translator): reintroduce thoughtSignature bypass logic for model parts**
- Restored `thoughtSignature` validator bypass for model-specific parts in Gemini content processing.
- Removed redundant logic from the `executor` for cleaner handling.
2025-11-23 20:52:23 +08:00
hkfires
62bfd62871 fix(aistudio): strip Gemini generation config overrides
Remove generationConfig.maxOutputTokens, generationConfig.responseMimeType and generationConfig.responseJsonSchema from the Gemini payload in translateRequest so we no longer send unsupported or conflicting response configuration fields. This lets the backend or caller control response formatting and output limits and helps prevent potential API errors caused by these keys.
2025-11-23 19:44:03 +08:00
Luis Pater
257621c5ed **chore(executor): update default agent version and simplify const formatting**
- Updated `defaultAntigravityAgent` to version `1.11.5`.
- Adjusted const value formatting for improved readability.

**feat(executor): introduce fallback mechanism for Antigravity base URLs**

- Added retry logic with fallback order for Antigravity base URLs to handle request errors and rate limits.
- Refactored base URL handling with `antigravityBaseURLFallbackOrder` and related utilities.
- Enhanced error handling in non-streaming and streaming requests with retry support and improved metadata reporting.
- Updated `buildRequest` to support dynamic base URL assignment.
2025-11-23 17:53:07 +08:00
Luis Pater
ac064389ca **feat(executor, translator): enhance token handling and payload processing**
- Improved Antigravity executor to handle `thinkingConfig` adjustments and default `thinkingBudget` when `thinkingLevel` is removed.
- Updated translator response handling to set default values for output token counts when specific token data is missing.
2025-11-23 11:32:37 +08:00
Luis Pater
8d23ffc873 **feat(executor): add model alias mapping and improve Antigravity payload handling**
- Introduced `modelName2Alias` and `alias2ModelName` functions for mapping between model names and aliases.
- Improved Antigravity payload transformation to include alias-to-model name conversion.
- Enhanced processing for Claude Sonnet models to adjust template parameters based on schema presence.
2025-11-23 03:16:14 +08:00
hkfires
166fa9e2e6 fix(gemini): parse stream usage from JSON, skip thoughtSignature 2025-11-22 16:07:12 +08:00
hkfires
88e566281e fix(gemini): filter SSE usage metadata in streams 2025-11-22 15:53:36 +08:00
hkfires
d32bb9db6b fix(runtime): treat non-empty finishReason as terminal 2025-11-22 15:39:46 +08:00
hkfires
8356b35320 fix(executor): expire stop chunks without usage metadata 2025-11-22 15:27:47 +08:00
hkfires
19a048879c feat(runtime): track antigravity usage and token counts 2025-11-22 14:04:28 +08:00
hkfires
1061354b2f fix: handle empty and non-JSON SSE chunks safely 2025-11-22 13:49:23 +08:00
hkfires
46b4110ff3 fix: preserve SSE usage metadata-only trailing chunks 2025-11-22 13:25:25 +08:00
hkfires
8ce22b8403 fix(sse): preserve usage metadata for stop chunks 2025-11-22 12:50:23 +08:00
Luis Pater
d291eb9489 Fixed: #302
**feat(executor): enhance WebSocket error handling and metadata logging**

- Added handling for stream closure before start with appropriate error recording.
- Improved metadata logging for non-OK HTTP status codes in WebSocket responses.
- Consolidated event processing logic with `processEvent` for better error handling and payload management.
- Refactored stream initialization to include the first event handling for smoother execution flow.
2025-11-22 11:18:13 +08:00
Luis Pater
c1031e2d3f **feat(translator): add Antigravity translation logic**
- Introduced request and response translation functions to enable compatibility between OpenAI Chat Completions API and Antigravity.
- Registered translation utilities for both streaming and non-streaming scenarios.
- Added support for reasoning content, tool calls, and metadata handling.
- Established request normalization and embedding for Antigravity-compatible payloads.
- Added new fields to `Params` struct for better tracking of finish reasons, usage metadata, and tool usage.
- Refactored handling of response transitions, final events, and state-driven logic in `ConvertAntigravityResponseToClaude`.
- Introduced `appendFinalEvents` and `resolveStopReason` helper functions for cleaner separation of concerns.
- Added `TotalTokenCount` field to `Params` struct for enhanced token tracking.
- Updated token count calculations to fallback on `TotalTokenCount` when specific counts are missing.
- Introduced `hasNonZeroUsageMetadata` function to validate presence of token data in `usage_metadata`.
2025-11-21 23:40:59 +08:00
hkfires
4ba5b43d82 feat(executor): share SSE usage filtering across streams 2025-11-21 16:51:05 +08:00
Luis Pater
2d84d2fb6a **feat(auth, executor, cmd): add Antigravity provider integration**
- Implemented OAuth login flow for the Antigravity provider in `auth/antigravity.go`.
- Added `AntigravityExecutor` for handling requests and streaming via Antigravity APIs.
- Created `antigravity_login.go` command for triggering Antigravity authentication.
- Introduced OpenAI-to-Antigravity translation logic in `translator/antigravity/openai/chat-completions`.

**refactor(translator, executor): update Gemini CLI response translation and add Antigravity payload customization**

- Renamed Gemini CLI translation methods to align with response handling (`ConvertGeminiCliResponseToGemini` and `ConvertGeminiCliResponseToGeminiNonStream`).
- Updated `init.go` to reflect these method changes.
- Introduced `geminiToAntigravity` function to embed metadata (`model`, `userAgent`, `project`, etc.) into Antigravity payloads.
- Added random project, request, and session ID generators for enhanced tracking.
- Streamlined `buildRequest` to use `geminiToAntigravity` transformation before request execution.
2025-11-21 12:43:16 +08:00
Luis Pater
cbcfeb92cc Fixed: #291
**feat(executor): add thinking level to budget conversion utility**

- Introduced `ConvertThinkingLevelToBudget` to map thinking level ("high"/"low") to corresponding budget values.
- Applied the utility in `aistudio_executor.go` before stripping unsupported configs.
- Updated dependencies to include `tidwall/gjson` for JSON parsing.
2025-11-21 00:48:12 +08:00
Luis Pater
d50b0f7524 **refactor(executor): simplify Gemini CLI execution and remove internal retry logic**
- Removed nested retry handling for 429 rate limit errors.
- Simplified request/response handling by cleaning redundant retry-related code.
- Eliminated `parseRetryDelay` function and max retry configuration logic.
2025-11-20 17:49:37 +08:00
Ben Vargas
0ff094b87f fix(executor): prevent streaming on failed response when no fallback
Fix critical bug where ExecuteStream would create a streaming channel
from a failed (non-2xx) response after exhausting all retries with no
fallback models available.

When retries were exhausted on the last model, the code would break from
the inner loop but fall through to streaming channel creation (line 401),
immediately returning at line 461. This made the error handling code at
lines 464-471 unreachable, causing clients to receive an empty/closed
stream instead of a proper error response.

Solution: Check if httpResp is non-2xx before creating the streaming
channel. If failed, continue the outer loop to reach error handling.

Identified by: codex-bot review
Ref: https://github.com/router-for-me/CLIProxyAPI/pull/280#pullrequestreview-3484560423
2025-11-19 13:14:40 -07:00
Ben Vargas
ed23472d94 fix(executor): prevent streaming from 429 response when fallback available
Fix critical bug where ExecuteStream would create a streaming channel
using a 429 error response instead of continuing to the next fallback
model after exhausting retries.

When 429 retries were exhausted and a fallback model was available,
the inner retry loop would break but immediately fall through to the
streaming channel creation, attempting to stream from the failed 429
response instead of trying the next model.

Solution: Add shouldContinueToNextModel flag to explicitly skip the
streaming logic and continue the outer model loop when appropriate.

Identified by: codex-bot review
Ref: https://github.com/router-for-me/CLIProxyAPI/pull/280#pullrequestreview-3484479106
2025-11-19 13:05:38 -07:00
Ben Vargas
6a3de3a89c feat(executor): add intelligent retry logic for 429 rate limits
Implement Google RetryInfo.retryDelay support for handling 429 rate
limit errors. Retries same model up to 3 times using exact delays
from Google's API before trying fallback models.

- Add parseRetryDelay() to extract Google's retry guidance
- Implement inner retry loop in Execute() and ExecuteStream()
- Context-aware waiting with cancellation support
- Cap delays at 60s maximum for safety
2025-11-19 12:47:39 -07:00
Luis Pater
bf116b68f8 **feat(registry): add GPT-5.1 Codex Max model definitions and support**
- Introduced `gpt-5.1-codex-max` variants to model definitions (`low`, `medium`, `high`, `xhigh`).
- Updated executor logic to map effort levels for Codex Max models.
- Added `lastCodexMaxPrompt` processing for `gpt-5.1-codex-max` prompts.
- Defined instructions for `gpt-5.1-codex-max` in a new file: `codex_instructions/gpt-5.1-codex-max_prompt.md`.
2025-11-20 03:12:22 +08:00
Luis Pater
cc3cf09c00 **feat(auth): add AuthIndex for diagnostics and ensure usage recording** 2025-11-19 22:02:40 +08:00
hkfires
8a33f3ef69 fix: detect HTML error bodies without text/html content type 2025-11-19 14:45:33 +08:00
hkfires
b52a5cc066 feat(auth): add iFlow cookie-based authentication support 2025-11-18 22:35:35 +08:00
Luis Pater
db2d22c978 **fix(runtime): simplify scanner buffer allocation in executor implementations** 2025-11-18 10:59:49 +08:00
Luis Pater
1ccb01631d **refactor(runtime): centralize reasoning effort logic for GPT models**
Extract reasoning effort mapping into a reusable function `setReasoningEffortByAlias` to reduce redundancy and improve maintainability. Introduce support for the "gpt-5.1-none" variant in the registry and runtime executor.
2025-11-14 17:24:40 +08:00
Ben Vargas
cfbaed0e90 fix(runtime): remove gpt-5.1 minimal effort variant
Stop advertising and mapping the unsupported gpt-5.1-minimal variant in the model registry and Codex executor, and align bare gpt-5.1 requests to use medium reasoning effort like Codex CLI while preserving minimal for gpt-5.
2025-11-13 19:43:52 -07:00
Luis Pater
cf9b9be7ea **feat(runtime): extend executor support for GPT-5.1 Codex and variants**
Expand executor logic to handle GPT-5.1 Codex family and its variants, including reasoning effort configurations for minimal, low, medium, and high levels. Ensure proper mapping of models to payload parameters.
2025-11-14 08:08:25 +08:00
Luis Pater
fcd98f4f9b **feat(runtime): add payload configuration support for executors**
Introduce `PayloadConfig` in the configuration to define default and override rules for modifying payload parameters. Implement `applyPayloadConfig` and `applyPayloadConfigWithRoot` to apply these rules across various executors, ensuring consistent parameter handling for different models and protocols. Update all relevant executors to utilize this functionality.
2025-11-13 23:27:40 +08:00
Luis Pater
75b57bc112 Fixed: #246
feat(runtime): add support for GPT-5.1 models and variants

Introduce GPT-5.1 model family, including minimal, low, medium, high, Codex, and Codex Mini variants. Update tokenization and reasoning effort handling to accommodate new models in executor and registry.
2025-11-13 17:42:19 +08:00
Luis Pater
d0aa741d59 feat(gemini-cli): add multi-project support and enhance credential handling
Introduce support for multi-project Gemini CLI logins, including shared and virtual credential management. Enhance runtime, metadata handling, and token updates for better project granularity and consistency across virtual and shared credentials. Extend onboarding to allow activating all available projects.
2025-11-13 02:55:32 +08:00
Luis Pater
592f6fc66b feat(vertex): add usage source resolution for Vertex projects
Extend `resolveUsageSource` to support Vertex projects by extracting and normalizing `project_id` or `project` from the metadata for accurate source resolution.
2025-11-12 08:43:02 +08:00
Luis Pater
d6bd6f3fb9 feat(vertex, management): enhance token handling and OAuth2 integration
Extend `vertexAccessToken` to support proxy-aware HTTP clients and update calls accordingly for better configurability. Add `deleteTokenRecord` to handle token cleanup, improving management of authentication files.
2025-11-11 23:42:46 +08:00
Luis Pater
717eadf128 feat(vertex): add support for Vertex AI Gemini authentication and execution
Introduce Vertex AI Gemini integration with support for service account-based authentication, credential storage, and import functionality. Added new executor for Vertex AI requests, including execution and streaming paths, and integrated it into the core manager. Enhanced CLI with `--vertex-import` flag for importing service account keys.
2025-11-10 12:23:51 +08:00
Luis Pater
ef7e8206d3 fix(executor): ensure usage reporting for upstream responses lacking usage data
Add `ensurePublished` to guarantee request counting even when usage fields (e.g., tokens) are absent in OpenAI-compatible executor responses, particularly for streaming paths.
2025-11-09 17:24:47 +08:00