Add an "Only Official Website" header to the three READMEs, an
About panel button, and a tray menu entry — all pointing to
ccswitch.io. Consolidates brand and SEO signals on the canonical
domain across docs, GUI, and system tray.
- Guard debug body serialization with `log::log_enabled!`; previously
serialized the filtered body to a throwaway String on every forward,
even with debug logging off.
- Skip SSE parse + UTF-8 buffer loop when no usage collector and debug
is off; the per-chunk `serde_json::from_str::<Value>` ran even in
pure passthrough mode.
- Add cheap per-app SSE event pre-filter (string `contains`) so usage
collectors only parse events that could contain usage (e.g. Claude
`message_start` / `message_delta`).
- Skip non-streaming response body JSON parse when usage logging is
disabled.
- Move `ProviderRouter::record_result` off the success response path
via `tokio::spawn` for non-HalfOpen state; that call internally does
`get_proxy_config_for_app` + `update_provider_health`, two SQLite
ops that previously blocked TTFB.
Also: dedupe `usage_logging_enabled` (was duplicated in handlers.rs)
and merge `SseUsageCollector::{new, new_filtered}` into a single
constructor that takes `Option<StreamUsageEventFilter>`.
prompt_cache_key was falling back to provider.id when the client did not
supply a session, which collapsed every conversation onto a single key
and defeated upstream prefix caching. Only emit the key when a real
client-provided session/thread identity is available; otherwise let the
upstream use its default matching behaviour.
Additional fixes that affect cache stability:
- Canonicalise (sort) JSON keys in outgoing request bodies and in
tool_call arguments / tool_result content so semantically identical
requests produce identical byte sequences for upstream prefix caches.
- Exempt JSON Schema property maps (properties, patternProperties,
definitions, \$defs) from the underscore-prefix filter so user-defined
schema keys like _id and _meta survive.
- Add a [CacheTrace] debug log with stable hashes for instructions,
tools, input and include to help diagnose cache misses.
- Thread session_id into the usage logger for request correlation.
- Apply rustfmt diffs in claude_desktop_config.rs
- Allow needless_return on current_platform_paths (cfg-mirrored arms)
- Allow too_many_arguments on RequestForwarder::forward
- Replace `let mut + reassign` with struct literals in tests
(settings, backup, provider, response_processor)
- Use Path::new instead of PathBuf::from to fix cmp_owned in misc tests
- Replace 3.14 with 3.5 in config test to avoid approx_constant lint
Claude Desktop strips the [1M] suffix from model IDs when sending
requests, causing route lookup to fail with "model route is not
configured". Fall back to base-name comparison when exact match misses.
- services/proxy.rs: collapse 10 repeated `OpenCode | OpenClaw | Hermes |
ClaudeDesktop` match arms into `_` fallthroughs.
- claude_desktop_config.rs: extract a `with_rollback` closure shared by
apply_provider_to_paths and restore_official_at_paths.
- useProviderActions.ts: replace the triple-nested ternary picking the
switch-success toast message with a flat let/if/else block.
Net -36 lines. No behavior change; cargo test and pnpm typecheck pass.
Adds a new ClaudeDesktop AppType that writes Claude Desktop's third-party
inference profile under configLibrary/, sharing _meta.json with other
launchers (Ollama-compatible) so cc-switch can coexist with them.
Two switch modes:
- direct: provider already exposes claude-* / anthropic/claude-* model
ids on Anthropic Messages, Claude Desktop connects to it directly.
- proxy: cc-switch's local proxy acts as the inference gateway,
presenting only claude-* route names to Claude Desktop and mapping
them to real upstream models. Required after Anthropic restricted
Claude Desktop to claude-family ids.
Backend:
- New module claude_desktop_config with snapshot/rollback, official seed
bypass, /claude-desktop/v1/{models,messages} routes, and a single
source of truth for default proxy routes.
- Gateway token persisted in SQLite, validated on every proxied request.
- get_claude_desktop_status surfaces drift signals (stale models,
missing routes, proxy stopped, base URL mismatch, missing token).
Frontend:
- Slim ClaudeDesktopProviderForm independent from ProviderForm,
controlled by a top-level appId guard.
- ProviderList banner consumes the status query (5s polling) and
renders actionable diagnostics.
- ClaudeDesktopRouteToggle in the header to start/stop the local
gateway without touching takeover state.
- Three-locale i18n synchronised.
The hyper raw-write path preserves original header casing but rebuilds
TCP+TLS on every request — there is no connection pool — which was the
root cause of slow reverse-proxy throughput.
Only Anthropic-native requests actually need exact header-case
preservation. Route OpenAI/Copilot/Codex/Gemini/codex_oauth requests
through the pooled reqwest client (pool_max_idle_per_host=10,
tcp_keepalive=60s) instead, so warm connections get reused.
Streaming requests get a precise first-byte timeout via
tokio::time::timeout around reqwest's send() (which resolves on
response headers), with the body phase handed off to response_processor.
The streaming-detection helper now also covers Gemini SSE endpoints
and Accept: text/event-stream, not just body.stream.
Anthropic SDK assigns distinct semantics to the two env vars:
- ANTHROPIC_API_KEY -> x-api-key
- ANTHROPIC_AUTH_TOKEN -> Authorization: Bearer
The Claude adapter previously collapsed both into AuthStrategy::Anthropic
and then emitted Authorization: Bearer regardless, breaking strict
Anthropic-protocol endpoints (Anthropic official, Cloudflare AI Gateway,
OpenCode Go, DashScope) and silently overriding the user's intended auth
scheme.
- claude::extract_auth: infer strategy from env var name
(ANTHROPIC_AUTH_TOKEN -> ClaudeAuth, ANTHROPIC_API_KEY -> Anthropic),
matching the precedence already used by extract_key.
- claude::get_auth_headers: split the Anthropic arm so it emits
x-api-key, while ClaudeAuth and Bearer continue to use Bearer.
- stream_check: reuse ClaudeAdapter::get_auth_headers as the single
source of truth, replacing the prior "always Bearer + maybe x-api-key"
double injection that produced auth conflicts and false-negative
health checks.
- Cover each strategy -> header mapping and env-var precedence with
new unit tests in claude.rs.
Refs #2368, #2380
Claude Code injects a dynamic `x-anthropic-billing-header` line at the
start of `system` content. Its rotating `cch=` token was forwarded into
OpenAI Responses `instructions` and Chat system messages, which broke
upstream prefix prompt cache reuse — a stable ~95k-token prefix was
getting re-charged on every request.
Strip only the leading occurrence in both anthropic_to_openai and
anthropic_to_responses; later occurrences are preserved so user-authored
prompt text containing the same string is not lost.
Hermes aggregates all in-process API calls into a single sessions row
with the `model` field locked to the initial model, so the usage
dashboard cannot cleanly surface per-call billing context. Two rounds
of UI workarounds (raw mapping, then `<model> @ <host>` display) did
not resolve the user-facing confusion, so the whole tracking
integration is dropped for now.
Removes session_usage_hermes service (and its 17 tests), sync wiring
in commands/usage.rs and lib.rs, _hermes_session/hermes_session
entries in usage_stats SQL (provider_name_coalesce CASE and
effective_usage_log_filter IN clause), frontend Tab/banner/dropdown/
icon entries, and four i18n keys per locale.
Hermes app integration outside usage tracking (proxy routing,
session manager, config) is preserved. Pre-existing hermes rows in
proxy_request_logs are left as orphans — filtered out by the
updated SQL and never surfaced in the UI.
Zhipu's `data.limits[]` returns 1 entry for legacy plans (subscribed
before 2026-02-12) and 2 entries for current plans. Previously every
TOKENS_LIMIT entry was hardcoded as `five_hour`, so the weekly bucket
was rendered with the 5-hour i18n label.
Sort TOKENS_LIMIT entries by nextResetTime ascending and assign
`five_hour` to index 0, `weekly_limit` to index 1. Legacy plans
naturally degrade to a single five_hour tier.
Also harden the parser: case-insensitive type match (defends against
upstream casing changes), reuse TIER_FIVE_HOUR/TIER_WEEKLY_LIMIT
constants, and add 8 unit tests covering both plan shapes plus
defensive edge cases.
* fix(dashscope): enhance usage parsing robustness to prevent VSCode crashes
Enhanced build_anthropic_usage_from_responses() to handle null, missing, empty,
and partial usage fields gracefully. This prevents VSCode Extension crashes with
"Cannot read properties of null (reading 'output_tokens')" when connecting to
DashScope (Alibaba Cloud Bailian) models.
Changes:
- Added defensive null checks and empty object detection
- Implemented OpenAI field name fallbacks (prompt_tokens/completion_tokens)
- Added comprehensive logging for malformed usage scenarios
- Fixed streaming SSE event handlers with null-safe usage access
- Preserved cache token fields even when input/output tokens are missing
This ensures the proxy never crashes on malformed Responses API usage objects,
returning valid Anthropic-compatible usage structures (input_tokens/output_tokens)
in all cases.
* fix(proxy): tighten Responses API usage fix per review
- Drop redundant fallback in streaming.rs Chat Completions path; the
existing if-let-Some guard already prevents usage:null, so the extra
layer was dead code and caused a fmt-breaking indentation issue.
- Demote partial-usage warn to debug. Streaming chunks legitimately
arrive with partial token counts and the warn-level log was noisy.
- Rewrite CHANGELOG entry: reference #2422, broaden scope from
DashScope-only to all api_format=openai_responses users (Codex OAuth
is the strongest signal; DashScope compatible-mode/v1/responses is
the original report).
- cargo fmt to clear 12 formatting differences vs main.
---------
Co-authored-by: Jason <farion1231@gmail.com>
* fix(config): sort JSON keys alphabetically for deterministic output
Ensures settings.json keys are written in sorted order, preventing
non-deterministic git diffs when switching configs.
* test(config): add unit tests for sort_json_keys and fix formatting
Cover top-level sort, nested recursion, array order preservation,
primitive pass-through, empty collections, and the core determinism
guarantee (different insertion orders must yield identical output).
Also fix line-length in write_json_file flagged by `cargo fmt --check`.
---------
Co-authored-by: Jason <farion1231@gmail.com>
* Keep Codex history stable across provider switches
* Restore template Codex provider id when backfilling live config
Backfill writes the current Codex live config back to the previous
provider's stored template after a switch. Because the live file now
carries a normalized stable model_provider id, the previous provider's
template would lose its own provider-specific id (and any matching
[profiles.*] references) on every subsequent switch.
Reverse the normalization at backfill time by rewriting model_provider,
the active model_providers section, and matching profile references back
to the template's original id.
---------
Co-authored-by: Jason <farion1231@gmail.com>
Hermes:
- Parse ~/.hermes/state.db sessions (incl. profiles/*/state.db) into
proxy_request_logs with data_source='hermes_session', WAL-aware
incremental sync, Hermes-reported cost preferred over model_pricing
fallback
Zero-cost bug (dashboard showed \$0 totals):
- GPT-5.5 family default pricing (~83% of affected rows used GPT-5.5)
- find_model_pricing_row: ASCII-lowercase normalization so
"OpenAI/GPT-5.5@HIGH" matches seeded "gpt-5.5"
- Startup cost backfill in async task: scan rows where total_cost <= 0
but tokens > 0, recompute via model_pricing in a single transaction
Performance:
- Add (app_type, created_at DESC) covering index for dashboard range
queries
- Add expression index on COALESCE(data_source, 'proxy') so dedup EXISTS
subqueries use index lookup instead of full scan; drop superseded
idx_request_logs_dedup_lookup
Refactor:
- row_to_request_log_detail helper (3-way de-dup; fixes cost_multiplier
\"1\" vs \"1.0\" drift between callers)
- Promote get_sync_state/update_sync_state to shared session_usage
module (4 copies -> 1)
- run_step helper in lib.rs replaces 9 if-let-Err blocks
- maybe_backfill_log_costs returns bool to skip duplicate total_cost
parsing in caller
Proxy writes and session-log sync wrote to proxy_request_logs with
mismatched request_ids: only Claude on a native Anthropic backend used the
shared `session:{message_id}` key. Codex/Gemini and Claude-through-OpenAI
providers always produced distinct ids, so primary-key dedup never fired
and every real request was recorded twice.
Adds a 7-dim fingerprint dedup (app_type, 4 token counts, 2xx status,
model with case-insensitive match, ±10min window) wired into three layers:
- Write path: should_skip_session_insert() blocks duplicate session rows
before INSERT, unifying the previously-divergent Claude/Codex/Gemini
paths through a single DedupKey-based helper.
- Read path: effective_usage_log_filter() excludes already-covered session
rows from every aggregation query.
- Rollup path: same filter applied so usage_daily_rollups never absorbs
duplicates.
Also adds a covering index (idx_request_logs_dedup_lookup) so the EXISTS
subquery stays index-only, and a transform.rs regression test that pins
openai_to_anthropic id preservation - the missing piece that lets
Claude+OpenAI-compatible providers reuse the session: id scheme.
The query_siliconflow function received an is_cn flag that only switched
the request domain (.cn vs .com) but the response builder hardcoded
unit="CNY" for both sites. International users at api.siliconflow.com
saw their USD balance labelled as CNY. Now unit and plan_name follow
is_cn, so the EN site shows USD and "SiliconFlow (EN)".
- Deduplicate repeated upstream `finish_reason` chunks so only one Anthropic `message_delta` is emitted.
- Preserve late `choices: []` usage-only chunks before sending the final `message_delta`.
- Keep stream error paths from emitting successful terminal events.
- Add regressions for duplicate finish reasons, usage-only chunks, missing `[DONE]`, and truncated streams.
* feat(provider-form): soften business-rule validation with "save anyway" prompt
Refactor handleSubmit so empty-field / missing-item validations (provider
name, endpoint, API key, opencode model, template variables, provider key
required) no longer hard-reject with toast.error. Instead they are collected
into an issues list and presented via a ConfirmDialog; the user can cancel
or choose "Save anyway" to proceed.
Integrity constraints stay as hard rejections:
- providerKey regex / duplicate (would corrupt other providers)
- Copilot / Codex OAuth not authenticated (no token, cannot establish)
- omo Other Fields JSON not an object / parse failure
This aligns the frontend with the backend's existing "relaxed save / strict
switch" split (see gemini_config.rs: validate_gemini_settings vs
validate_gemini_settings_strict) and unblocks legitimate configs such as
AWS Bedrock, Vertex AI, and custom Gemini base URLs that the UI previously
refused to save.
Refs: #2196, #1204
* fix(provider-form): address review feedback on soft-validation
P1: move empty providerKey back to hard rejection for OpenCode / OpenClaw /
Hermes. Since providerKey is the primary identity for these apps and the
mutations layer throws "Provider key is required" when absent, letting users
click "save anyway" would surface a generic error toast instead of a
precise, actionable one. Treat empty providerKey as an integrity constraint
alongside regex / duplicate checks.
P2: give the soft-confirm submit path its own submitting state. The
confirm-dialog path bypassed react-hook-form's isSubmitting lifecycle, so
slow or failing saves left the outer submit button responsive and could
spawn unhandled rejections. Now the confirm handler awaits performSubmit
inside try/catch/finally, uses an isConfirmSubmitting flag to gate both
confirm and cancel clicks, and folds the flag into the outer disabled
state and onSubmittingChange callback.
Refs: #2307 review comments
* chore(clippy): use push for single char '…' in truncate_body
Clippy 1.95 added single_char_add_str which flagged the push_str("…")
in truncate_body. Rebased onto latest upstream/main and applied the
suggested fix so the Backend Checks clippy job passes.
Unrelated to this PR's core changes; bundled in so the PR is mergeable
without waiting for a separate upstream fix.
---------
Co-authored-by: Allen <allen@AllenMacBook-M4-Pro.local>
DeepSeek released V4 flash/pro; legacy IDs deepseek-chat / deepseek-reasoner
now alias to deepseek-v4-flash and will be deprecated.
- Update claude/hermes/opencode/openclaw presets to v4-pro / v4-flash,
context 128K -> 1M; Claude Anthropic-compat endpoint routes OPUS/SONNET
to v4-pro and HAIKU to v4-flash, plus an explicit modelsUrl override.
- Seed deepseek-v4-flash ($0.14/$0.28 per 1M) and deepseek-v4-pro
($1.68/$3.36 per 1M) into model_pricing; older v3.x / chat / reasoner
rows kept for historical usage stats (INSERT OR IGNORE).
- Refresh user-manual (zh/en/ja) pricing table and note that legacy model
IDs are billed at v4-flash rates.
Providers like DeepSeek, Kimi, Zhipu GLM and MiniMax expose the
Anthropic-compatible API on a subpath (e.g. /anthropic) while the
OpenAI-style /models endpoint lives at the API root. The previous
heuristic blindly appended /v1/models to the Base URL, so every such
provider returned 404 and the UI mislabeled it as "provider does not
support fetching models".
Backend now generates a candidate list and tries them in order:
preset override -> baseURL /v1/models -> stripped-subpath /v1/models ->
stripped-subpath /models. Non-404/405 responses (auth, network) stop
immediately so we never retry against hostile status codes. Known
compat suffixes are kept in a length-descending constant so the
longest match wins; response bodies are truncated to 512 chars to
avoid HTML 404 pages bloating the error string.
Preset type gains an optional modelsUrl (DeepSeek points at
https://api.deepseek.com/models). Frontend threads the override
through fetchModelsForConfig when the current Base URL still matches
the preset default. A new fetchModelsEndpointNotFound i18n key
replaces the misleading "not supported" toast for exhausted-candidate
and 404/405 cases (zh/en/ja).
Copilot upstream returns model_not_supported when the client sends
dash-form Claude IDs (claude-sonnet-4-6, claude-sonnet-4-6[1m]) while
/models only accepts dot form (claude-sonnet-4.6, -1m suffix).
- Add copilot_model_map: syntax normalize (dash->dot, [1m]->-1m) plus
live /models exact match and family-version fallback, reusing the
existing 5 min auth cache. Returns None when the whole family is
absent so upstream surfaces an explicit error instead of silently
switching families.
- Wire into forwarder Copilot hook; runs before anthropic_to_openai
conversion.
- Default Opus slot in the Copilot preset maps to Sonnet 4.6: Pro
dropped all Opus on 2026-04-20 and Pro+ bills Opus 4.7 at 7.5x.
Users who want real Opus can switch manually in the UI.
Refs: https://github.com/farion1231/cc-switch/issues/2016
- Add v3.14.1 release notes (en/zh/ja) covering tray usage visibility,
Codex OAuth stability fixes, Skills import/install reliability, and
removal of the Hermes config health scanner
- Cut [Unreleased] into [3.14.1] in CHANGELOG with PR references
- Bump version in package.json, Cargo.toml, Cargo.lock, tauri.conf.json
dc04165f surfaced tray usage badges for Claude/Codex/Gemini official
OAuth only. Chinese coding-plan providers already expose 5h + weekly
windows through coding_plan::get_coding_plan_quota, but two gaps kept
the tray from rendering them.
- format_script_summary read only data.first(), truncating the tier-
flattened UsageResult to a single window. Detect plan_name matching
TIER_FIVE_HOUR / TIER_WEEKLY_LIMIT and emit the "🟢 h12% w80%" layout
used by format_subscription_summary; worst utilization drives the
emoji. Copilot / balance / custom scripts keep the legacy single-
bucket output via fallback.
- usage_script previously required manual activation through
UsageScriptModal. Auto-inject meta.usage_script on Claude provider
creation when ANTHROPIC_BASE_URL matches a known coding plan, so the
tray lights up without the user opening the modal. Does not overwrite
existing usage_script on update.
Extract the URL route table out of UsageScriptModal into a shared
codingPlanProviders module so the modal, the creation hook, and the
Rust coding_plan::detect_provider mirror all agree on one list.
Add TIER_WEEKLY_LIMIT alongside TIER_FIVE_HOUR and a createUsageScript()
factory to collapse the duplicated default fields across four call
sites and drop the remaining stringly-typed tier names.
The Hermes config.yaml schema has stabilized and users have migrated to
the current provider fields, so the value of scanning for model.provider
dangling references, custom_providers shape errors, v12 migration residue
etc. no longer justifies the maintenance surface — and the scan produces
false positives when users keep some providers under Hermes' v12+
providers: dict (Hermes' runtime merges both shapes, but CC Switch's
scanner only looked at the list form).
Removes the whole HermesHealthWarning type, scan_hermes_config_health
command, HermesHealthBanner React component, useHermesHealth hook,
warnings field on HermesWriteOutcome, and the three helper functions
(yaml_as_non_empty_str, collect_mapping_string_keys, hermes_warning)
that only served the scanner. Drops the matching i18n keys in
zh/en/ja and the fixInWebUI button label that only the banner used.
* feat: add Rust-side write-through usage cache
Introduce an in-memory UsageCache on AppState that the existing usage
query commands populate on success. The cache is read-only to the rest
of the app today; the next commit consumes it from the tray menu.
- New services::usage_cache module with split maps: subscription keyed
by AppType, script keyed by (AppType, provider_id).
- AppType gains Eq + Hash so it can be used as a HashMap key.
- commands::subscription::get_subscription_quota now takes State<AppState>
and writes through on success (signature change is invisible to the
frontend — Tauri injects State automatically).
- commands::provider::queryProviderUsage body extracted into an inner
async fn; the public command wraps it with write-through, covering
Copilot, coding-plan, balance, and generic script paths uniformly.
Cache is in-memory only; auto-query interval and the upcoming tray
refresh action rebuild it after restarts.
* feat(tray): surface cached usage in the system tray menu
Read UsageCache populated by the previous commit and render it in three
places, scoped to whatever TRAY_SECTIONS covers (Claude/Codex/Gemini):
1. Inline suffix on each provider submenu item
"AnyProvider · 🟢 5h 18% / 7d 23%"
2. Disabled summary row per visible app under "Show Main"
"Claude · Anthropic Official · 🟢 5h 18% / 7d 23%"
3. "Refresh all usage" menu item that triggers get_subscription_quota +
queryProviderUsage for every applicable provider, then rebuilds the
tray menu via the existing refresh_tray_menu path.
Color encoding uses emoji (🟢 <70% / 🟠 70-89% / 🔴 ≥90%) since Tauri 2
tray labels are plain text. Missing cache entry leaves the label
unchanged — tray never issues network requests when opened. Three new
i18n-ready strings live in TrayTexts (en/zh/ja), following the existing
pattern for tray text.
Closes#2178.
* feat(usage): bridge tray UsageCache writes to frontend React Query
Why: tray hover triggers backend-only refresh that wrote to UsageCache but
never notified the frontend, leaving main UI stale while tray showed fresh
numbers. Emit a payload-carrying event after each cache write so React Query
can setQueryData directly, keeping both views in sync without duplicate fetches.
* fix(tray): skip hidden apps on hover refresh and drop stale disabled-script cache
Address P2 findings from automated review on #2184:
1. refresh_all_usage_in_tray now filters TRAY_SECTIONS by settings.visible_apps
before scheduling subscription/script queries, matching create_tray_menu and
preventing wasted external API calls (and rate-limit/auth-error log noise)
for apps the user has hidden.
2. format_usage_suffix only trusts the script cache when provider.meta.usage_script
is still enabled; when a script is disabled/removed the cached suffix is now
invalidated so the tray label no longer shows stale data indefinitely.
* refactor: consolidate codex provider helpers and fix test semantics
- Add Provider::is_codex_oauth() and Provider::codex_fast_mode_enabled()
to eliminate duplicated meta extraction in claude.rs and stream_check.rs
- Fix non-codex-oauth tests to pass codex_fast_mode=false (was true, harmless
but semantically misleading)
- Remove redundant is_dir() guard after resolve_skill_source_dir already
guarantees the returned path is a directory
* style: apply cargo fmt
* fix(tray): reflect failed refreshes in cache and support Gemini flash-lite
Follow-up to the tray usage-display feature addressing review feedback:
- Write snapshots for both Ok(success:false) and Err paths in
queryProviderUsage / get_subscription_quota so stale success data
no longer persists across failed refreshes; the original Err is
still returned to the frontend onError handler.
- Include gemini_flash_lite tier in the tray summary with label "l".
Matches the frontend SubscriptionQuotaFooter and keeps the worst
emoji correct when lite is the highest utilization.
- Add TIER_GEMINI_PRO / _FLASH / _FLASH_LITE constants in
services/subscription.rs and reuse them in classify_gemini_model
and sort_order.
- Extract Provider::has_usage_script_enabled() to remove the
duplicated meta.usage_script chain at two call sites.
- Use db.get_provider_by_id in refresh_all_usage_in_tray instead of
materialising the full provider map, and parallelise subscription
and script futures via futures::future::join.
- Narrow refresh_all_usage_in_tray to each section's effective
current provider (script if enabled, else subscription when the
provider is official). Hover refreshes now issue at most
TRAY_SECTIONS.len() outbound requests.
- Add 10 unit tests in tray::tests covering Claude/Codex h/w dispatch,
Gemini p/f/l dispatch (including lite-only and lite-worst cases),
and success/failure guards.
---------
Co-authored-by: Jason <farion1231@gmail.com>
* Add Codex OAuth FAST mode toggle
* fix(codex-oauth): default FAST mode to off to avoid surprise quota burn
service_tier="priority" consumes ChatGPT subscription quota at a higher
rate. Users must now opt in explicitly rather than inherit FAST mode
silently when this feature ships.
---------
Co-authored-by: Jason <farion1231@gmail.com>
* Stabilize Codex OAuth cache routing
Codex OAuth-backed Claude proxy requests now reuse a client-provided session identity for prompt cache routing and send Codex-like session headers when that identity exists. Generated proxy UUIDs are intentionally excluded so they do not fragment cache locality.\n\nThe same path exposed two runtime issues during validation: rustls needed an explicit process crypto provider, and Codex OAuth can return Responses SSE even when the original Claude request is non-streaming. Those are handled so cache-routed requests can complete instead of panicking or being parsed as JSON.\n\nConstraint: Official Codex uses conversation identity and Responses session headers for prompt cache routing.\nRejected: Always use generated proxy session IDs | generated IDs change per request and reduce cache reuse.\nConfidence: medium\nScope-risk: moderate\nDirective: Do not remove the client-provided-session guard unless generated session IDs become stable per conversation.\nTested: cargo test codex_oauth\nTested: Local dev app health check on 127.0.0.1:15721\nTested: Local proxy logs showed cache_read_tokens after restart\nNot-tested: Full cargo test without local cc-switch port conflict\nRelated: #2217
* feat(proxy): aggregate forced Codex OAuth SSE into JSON for non-streaming clients
Narrow override on top of #2235's streaming fallback.
Codex OAuth always forces upstream openai_responses into SSE, even
when the original Claude request is stream:false. #2235 handles this
by routing such responses through the streaming transform so the
client receives text/event-stream — that avoids the 422 that JSON
parsing would produce, and it also protects any other provider that
unexpectedly returns SSE (the response.is_sse() guard).
But for Claude SDK callers that sent stream:false, returning SSE
still violates the Anthropic non-streaming contract. This commit
adds an override on exactly one combination — non-streaming client
+ codex_oauth + openai_responses — to aggregate the upstream
Responses SSE into a synthetic Responses JSON and then run the
regular responses_to_anthropic non-streaming transform. All other
paths, including the generic response.is_sse() fallback, remain
on the streaming path from #2235.
The aggregator reuses proxy::sse::take_sse_block / strip_sse_field,
which support both \n\n and \r\n\r\n delimiters; a hand-rolled
split("\n\n") would silently fail on real HTTPS upstreams.
Tests cover the happy path, CRLF delimiters, response.failed
errors, and the missing response.completed defensive branch.
---------
Co-authored-by: Jason <farion1231@gmail.com>
* fix(codex): use TOML parser instead of regex for model extraction
Regex only matched model=... on first line, TOML parser handles
multiline TOML correctly.
Fixes#2222
* fix(stream_check): drop unused regex::Regex import
The previous commit replaced the only Regex usage in stream_check.rs
with toml::Table parsing, leaving `use regex::Regex;` orphaned.
Without this removal, `cargo clippy -- -D warnings` (run in CI)
fails with `unused import: regex::Regex`.
---------
Co-authored-by: Jason <farion1231@gmail.com>