Commit Graph

125 Commits

Author SHA1 Message Date
hkfires
166fa9e2e6 fix(gemini): parse stream usage from JSON, skip thoughtSignature 2025-11-22 16:07:12 +08:00
hkfires
88e566281e fix(gemini): filter SSE usage metadata in streams 2025-11-22 15:53:36 +08:00
hkfires
d32bb9db6b fix(runtime): treat non-empty finishReason as terminal 2025-11-22 15:39:46 +08:00
hkfires
8356b35320 fix(executor): expire stop chunks without usage metadata 2025-11-22 15:27:47 +08:00
hkfires
19a048879c feat(runtime): track antigravity usage and token counts 2025-11-22 14:04:28 +08:00
hkfires
1061354b2f fix: handle empty and non-JSON SSE chunks safely 2025-11-22 13:49:23 +08:00
hkfires
46b4110ff3 fix: preserve SSE usage metadata-only trailing chunks 2025-11-22 13:25:25 +08:00
hkfires
8ce22b8403 fix(sse): preserve usage metadata for stop chunks 2025-11-22 12:50:23 +08:00
Luis Pater
d291eb9489 Fixed: #302
**feat(executor): enhance WebSocket error handling and metadata logging**

- Added handling for stream closure before start with appropriate error recording.
- Improved metadata logging for non-OK HTTP status codes in WebSocket responses.
- Consolidated event processing logic with `processEvent` for better error handling and payload management.
- Refactored stream initialization to include the first event handling for smoother execution flow.
2025-11-22 11:18:13 +08:00
Luis Pater
c1031e2d3f **feat(translator): add Antigravity translation logic**
- Introduced request and response translation functions to enable compatibility between OpenAI Chat Completions API and Antigravity.
- Registered translation utilities for both streaming and non-streaming scenarios.
- Added support for reasoning content, tool calls, and metadata handling.
- Established request normalization and embedding for Antigravity-compatible payloads.
- Added new fields to `Params` struct for better tracking of finish reasons, usage metadata, and tool usage.
- Refactored handling of response transitions, final events, and state-driven logic in `ConvertAntigravityResponseToClaude`.
- Introduced `appendFinalEvents` and `resolveStopReason` helper functions for cleaner separation of concerns.
- Added `TotalTokenCount` field to `Params` struct for enhanced token tracking.
- Updated token count calculations to fallback on `TotalTokenCount` when specific counts are missing.
- Introduced `hasNonZeroUsageMetadata` function to validate presence of token data in `usage_metadata`.
2025-11-21 23:40:59 +08:00
hkfires
4ba5b43d82 feat(executor): share SSE usage filtering across streams 2025-11-21 16:51:05 +08:00
Luis Pater
2d84d2fb6a **feat(auth, executor, cmd): add Antigravity provider integration**
- Implemented OAuth login flow for the Antigravity provider in `auth/antigravity.go`.
- Added `AntigravityExecutor` for handling requests and streaming via Antigravity APIs.
- Created `antigravity_login.go` command for triggering Antigravity authentication.
- Introduced OpenAI-to-Antigravity translation logic in `translator/antigravity/openai/chat-completions`.

**refactor(translator, executor): update Gemini CLI response translation and add Antigravity payload customization**

- Renamed Gemini CLI translation methods to align with response handling (`ConvertGeminiCliResponseToGemini` and `ConvertGeminiCliResponseToGeminiNonStream`).
- Updated `init.go` to reflect these method changes.
- Introduced `geminiToAntigravity` function to embed metadata (`model`, `userAgent`, `project`, etc.) into Antigravity payloads.
- Added random project, request, and session ID generators for enhanced tracking.
- Streamlined `buildRequest` to use `geminiToAntigravity` transformation before request execution.
2025-11-21 12:43:16 +08:00
Luis Pater
cbcfeb92cc Fixed: #291
**feat(executor): add thinking level to budget conversion utility**

- Introduced `ConvertThinkingLevelToBudget` to map thinking level ("high"/"low") to corresponding budget values.
- Applied the utility in `aistudio_executor.go` before stripping unsupported configs.
- Updated dependencies to include `tidwall/gjson` for JSON parsing.
2025-11-21 00:48:12 +08:00
Luis Pater
d50b0f7524 **refactor(executor): simplify Gemini CLI execution and remove internal retry logic**
- Removed nested retry handling for 429 rate limit errors.
- Simplified request/response handling by cleaning redundant retry-related code.
- Eliminated `parseRetryDelay` function and max retry configuration logic.
2025-11-20 17:49:37 +08:00
Ben Vargas
0ff094b87f fix(executor): prevent streaming on failed response when no fallback
Fix critical bug where ExecuteStream would create a streaming channel
from a failed (non-2xx) response after exhausting all retries with no
fallback models available.

When retries were exhausted on the last model, the code would break from
the inner loop but fall through to streaming channel creation (line 401),
immediately returning at line 461. This made the error handling code at
lines 464-471 unreachable, causing clients to receive an empty/closed
stream instead of a proper error response.

Solution: Check if httpResp is non-2xx before creating the streaming
channel. If failed, continue the outer loop to reach error handling.

Identified by: codex-bot review
Ref: https://github.com/router-for-me/CLIProxyAPI/pull/280#pullrequestreview-3484560423
2025-11-19 13:14:40 -07:00
Ben Vargas
ed23472d94 fix(executor): prevent streaming from 429 response when fallback available
Fix critical bug where ExecuteStream would create a streaming channel
using a 429 error response instead of continuing to the next fallback
model after exhausting retries.

When 429 retries were exhausted and a fallback model was available,
the inner retry loop would break but immediately fall through to the
streaming channel creation, attempting to stream from the failed 429
response instead of trying the next model.

Solution: Add shouldContinueToNextModel flag to explicitly skip the
streaming logic and continue the outer model loop when appropriate.

Identified by: codex-bot review
Ref: https://github.com/router-for-me/CLIProxyAPI/pull/280#pullrequestreview-3484479106
2025-11-19 13:05:38 -07:00
Ben Vargas
6a3de3a89c feat(executor): add intelligent retry logic for 429 rate limits
Implement Google RetryInfo.retryDelay support for handling 429 rate
limit errors. Retries same model up to 3 times using exact delays
from Google's API before trying fallback models.

- Add parseRetryDelay() to extract Google's retry guidance
- Implement inner retry loop in Execute() and ExecuteStream()
- Context-aware waiting with cancellation support
- Cap delays at 60s maximum for safety
2025-11-19 12:47:39 -07:00
Luis Pater
bf116b68f8 **feat(registry): add GPT-5.1 Codex Max model definitions and support**
- Introduced `gpt-5.1-codex-max` variants to model definitions (`low`, `medium`, `high`, `xhigh`).
- Updated executor logic to map effort levels for Codex Max models.
- Added `lastCodexMaxPrompt` processing for `gpt-5.1-codex-max` prompts.
- Defined instructions for `gpt-5.1-codex-max` in a new file: `codex_instructions/gpt-5.1-codex-max_prompt.md`.
2025-11-20 03:12:22 +08:00
Luis Pater
cc3cf09c00 **feat(auth): add AuthIndex for diagnostics and ensure usage recording** 2025-11-19 22:02:40 +08:00
hkfires
8a33f3ef69 fix: detect HTML error bodies without text/html content type 2025-11-19 14:45:33 +08:00
hkfires
b52a5cc066 feat(auth): add iFlow cookie-based authentication support 2025-11-18 22:35:35 +08:00
Luis Pater
db2d22c978 **fix(runtime): simplify scanner buffer allocation in executor implementations** 2025-11-18 10:59:49 +08:00
Luis Pater
1ccb01631d **refactor(runtime): centralize reasoning effort logic for GPT models**
Extract reasoning effort mapping into a reusable function `setReasoningEffortByAlias` to reduce redundancy and improve maintainability. Introduce support for the "gpt-5.1-none" variant in the registry and runtime executor.
2025-11-14 17:24:40 +08:00
Ben Vargas
cfbaed0e90 fix(runtime): remove gpt-5.1 minimal effort variant
Stop advertising and mapping the unsupported gpt-5.1-minimal variant in the model registry and Codex executor, and align bare gpt-5.1 requests to use medium reasoning effort like Codex CLI while preserving minimal for gpt-5.
2025-11-13 19:43:52 -07:00
Luis Pater
cf9b9be7ea **feat(runtime): extend executor support for GPT-5.1 Codex and variants**
Expand executor logic to handle GPT-5.1 Codex family and its variants, including reasoning effort configurations for minimal, low, medium, and high levels. Ensure proper mapping of models to payload parameters.
2025-11-14 08:08:25 +08:00
Luis Pater
fcd98f4f9b **feat(runtime): add payload configuration support for executors**
Introduce `PayloadConfig` in the configuration to define default and override rules for modifying payload parameters. Implement `applyPayloadConfig` and `applyPayloadConfigWithRoot` to apply these rules across various executors, ensuring consistent parameter handling for different models and protocols. Update all relevant executors to utilize this functionality.
2025-11-13 23:27:40 +08:00
Luis Pater
75b57bc112 Fixed: #246
feat(runtime): add support for GPT-5.1 models and variants

Introduce GPT-5.1 model family, including minimal, low, medium, high, Codex, and Codex Mini variants. Update tokenization and reasoning effort handling to accommodate new models in executor and registry.
2025-11-13 17:42:19 +08:00
Luis Pater
d0aa741d59 feat(gemini-cli): add multi-project support and enhance credential handling
Introduce support for multi-project Gemini CLI logins, including shared and virtual credential management. Enhance runtime, metadata handling, and token updates for better project granularity and consistency across virtual and shared credentials. Extend onboarding to allow activating all available projects.
2025-11-13 02:55:32 +08:00
Luis Pater
592f6fc66b feat(vertex): add usage source resolution for Vertex projects
Extend `resolveUsageSource` to support Vertex projects by extracting and normalizing `project_id` or `project` from the metadata for accurate source resolution.
2025-11-12 08:43:02 +08:00
Luis Pater
d6bd6f3fb9 feat(vertex, management): enhance token handling and OAuth2 integration
Extend `vertexAccessToken` to support proxy-aware HTTP clients and update calls accordingly for better configurability. Add `deleteTokenRecord` to handle token cleanup, improving management of authentication files.
2025-11-11 23:42:46 +08:00
Luis Pater
717eadf128 feat(vertex): add support for Vertex AI Gemini authentication and execution
Introduce Vertex AI Gemini integration with support for service account-based authentication, credential storage, and import functionality. Added new executor for Vertex AI requests, including execution and streaming paths, and integrated it into the core manager. Enhanced CLI with `--vertex-import` flag for importing service account keys.
2025-11-10 12:23:51 +08:00
Luis Pater
ef7e8206d3 fix(executor): ensure usage reporting for upstream responses lacking usage data
Add `ensurePublished` to guarantee request counting even when usage fields (e.g., tokens) are absent in OpenAI-compatible executor responses, particularly for streaming paths.
2025-11-09 17:24:47 +08:00
hkfires
cfb9cb8951 feat(config): support HTTP headers across providers 2025-11-08 20:52:05 +08:00
Luis Pater
67ad26c35a fix(executor): remove default reasoning effort for gpt-5-codex-mini 2025-11-08 11:56:32 +08:00
Luis Pater
30d448e73c fix(executor): update model name from codex-mini-latest to gpt-5-codex-mini 2025-11-08 11:17:40 +08:00
jeffnash
ec354f7a1a add default medium reasoning case for gpt-5-codex-mini
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-07 17:12:10 -08:00
jeffnash
240e782606 add default medium reasoning case for gpt-5-codex-mini
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-07 17:11:40 -08:00
Jeff Nash
fcb0293c0d feat(registry): add GPT-5 Codex Mini model variants
Adds three new Codex Mini model variants (mini, mini-medium, mini-high)
that map to codex-mini-latest. Codex Mini supports medium and high
reasoning effort levels only (no low/minimal). Base model defaults to
medium reasoning effort.
2025-11-07 17:07:39 -08:00
Luis Pater
c8f20a66a8 fix(executor): add logging and prompt cache key handling for OpenAI responses 2025-11-07 22:40:45 +08:00
Luis Pater
3ce1b4159b fix(executor): remove outdated Gemini model previews from CLI fallback order 2025-11-07 10:30:22 +08:00
Luis Pater
38cfbac8f0 fix(executor): adjust Anthropic-Beta header handling for consistent API requests 2025-11-03 20:49:01 +08:00
Luis Pater
5be4d22b9b fix(executor): ensure consistent header application in Claude API requests 2025-11-03 17:57:20 +08:00
Luis Pater
64774a5786 fix(executor): remove safetySettings from payload in token counting request 2025-11-03 17:31:43 +08:00
Luis Pater
89b0d53a09 fix(executor): remove safetySettings from payload for Gemini requests 2025-11-01 16:53:48 +08:00
hkfires
7c1c4ee60b feat(gemini): add Gemini API key endpoints 2025-10-31 11:09:28 +08:00
hkfires
a517290726 refactor(executor): summarize API error bodies of html in debug logs 2025-10-31 06:58:38 +08:00
hkfires
1bbbd16df6 chore(logging): clarify 429 rate-limit retries in Gemini executor 2025-10-29 19:19:18 +08:00
hkfires
58d30369b4 fix(gemini-cli): correctly strip/normalize thinking config by model 2025-10-29 19:19:18 +08:00
hkfires
7dd93a4a25 fix(executor): only apply thinking config to supported models 2025-10-29 19:19:17 +08:00
Luis Pater
9d42e4b239 feat(runtime): add User-Agent headers to codex and claude executors
- Standardized User-Agent strings for Codex and Claude executors to improve request tracing and compatibility.
- Updated header insertion logic in both executors for consistency.
2025-10-29 12:57:37 +08:00