Guard the openai-response streaming path against truncated/invalid SSE data payloads by validating data: JSON before forwarding; surface a 502 terminal error instead of letting clients crash with JSON parse errors.
When /v1/responses streaming fails after headers are sent, we now emit a type=error chunk instead of an HTTP-style {error:{...}} payload, preventing AI SDK chunk validation errors.
- Introduced `passthrough-headers` option in configuration to control forwarding of upstream response headers.
- Updated handlers to respect the passthrough headers setting.
- Added tests to verify behavior when passthrough is enabled or disabled.
- Introduced unit tests for request logging middleware to enhance coverage.
- Added WebSocket-based Codex executor to support Responses API upgrade.
- Updated middleware logic to selectively capture request bodies for memory efficiency.
- Enhanced Codex configuration handling with new WebSocket attributes.
- Introduced `RequestKimiToken` API for Kimi authentication flow.
- Integrated device ID management throughout Kimi-related components.
- Enhanced header management for Kimi API requests with device ID context.
- Replaced all instances of `bytes.Clone` with direct references to enhance efficiency.
- Simplified payload handling across executors and translators by eliminating unnecessary data duplication.
- Replaced repetitive string operations with a centralized `escapeGJSONPathKey` function.
- Streamlined handling of JSON schema cleaning for Gemini and Antigravity requests.
- Improved payload management by transitioning from byte slices to strings for processing.
- Removed unnecessary cloning of byte slices in several places.
- Add firstChunkTimestamp field to ResponseWriterWrapper for sync capture
- Capture TTFB in Write() and WriteString() before async channel send
- Add SetFirstChunkTimestamp() to StreamingLogWriter interface
- Make requestTimestamp/apiResponseTimestamp required in LogRequest()
- Remove timestamp capture from WriteAPIResponse() (now via setter)
- Fix Gemini handler to set API_RESPONSE_TIMESTAMP before writing response
This ensures accurate TTFB measurement for all streaming API formats
(OpenAI, Gemini, Claude) by capturing timestamp synchronously when
the first response chunk arrives, not when the stream finalizes.
Previously:
- REQUEST INFO timestamp was captured at log write time (not request arrival)
- API RESPONSE had NO timestamp at all
This fix:
- Captures REQUEST INFO timestamp when request first arrives
- Adds API RESPONSE timestamp when upstream response arrives
Changes:
- Add Timestamp field to RequestInfo, set at middleware initialization
- Set API_RESPONSE_TIMESTAMP in appendAPIResponse() and gemini handler
- Pass timestamps through logging chain to writeNonStreamingLog()
- Add timestamp output to API RESPONSE section
This enables accurate measurement of backend response latency in error logs.
Introduce `TestExecuteStreamWithAuthManager_DoesNotRetryAfterFirstByte` to validate that stream executions do not retry after receiving partial responses. Implement `payloadThenErrorStreamExecutor` for test coverage of this behavior.
Optimize channel operations by introducing reusable context-aware send functions (`send` and `sendErr`) across `wsrelay`, `handlers`, and `cliproxy`. Ensure graceful handling of canceled contexts during stream operations.
Fixes#1082
When all Antigravity accounts are unavailable, the error response now returns
HTTP 502 (Bad Gateway) instead of HTTP 400 (Bad Request). This ensures that
NewAPI and other clients will retry the request on a different channel,
improving overall reliability.
Previously GeminiModels handler unconditionally overwrote displayName
and description with the model name, losing the original values defined
in model definitions (e.g., 'Gemini 3 Pro Preview').
Now only set these fields as fallback when they are missing or empty.
- Enhanced ID matching in `cliproxy` by adding additional conditions to better handle ID equality cases.
- Updated `gemini` handlers to include `displayName` and `description` in normalized models for enriched metadata.
- Updated `SDKConfig` to use `nonstream-keepalive-interval` (seconds) instead of the boolean `nonstream-keepalive`.
- Refactored handlers and logic to incorporate the new interval-based configuration.
- Updated config diff, tests, and example YAML to reflect the changes.
- Introduced `StartNonStreamingKeepAlive` to emit periodic blank lines during non-streaming responses.
- Added `nonstream-keepalive` configuration option in `SDKConfig`.
- Updated handlers to utilize `StartNonStreamingKeepAlive` and ensure proper cleanup.
- Extended config diff and tests to include `nonstream-keepalive` changes.
Improved consistency across OpenAI, Claude, and Gemini handlers by replacing initial `select` statement with a `for` loop for better readability and error-handling robustness.