Add Unity2.ai, a high-performance AI API relay partner, as a preset for
Claude, Codex, Gemini, OpenCode, OpenClaw, Claude Desktop, and Hermes.
Each preset carries the referral signup link as apiKeyUrl.
- Register the unity2 icon via iconUrls (PNG URL import) + metadata
- Add partnerPromotion copy in zh/en/ja/zh-TW; backfill the missing
zh-TW ccsub entry
- List Unity2.ai in the sponsor section of all README locales
- Codex uses the bare base URL (gateway exposes /responses at root);
OpenCode/OpenClaw/Hermes use the /v1 chat-completions endpoint with
gpt-5.5 as the only preset model
- Trim CCSub OpenCode/OpenClaw/Hermes model lists to gpt-5.5 to match
- Normalize unity2/ccsub banners to the standard 2.41 aspect ratio
The Claude/Codex format-transform non-stream branch returned an opaque 422
"Failed to parse upstream response" whenever a 2xx upstream body was not
valid JSON. The common case: MaaS gateways force-stream a stream:false
request and return an SSE body with a non-SSE Content-Type, defeating the
header-only is_sse() check.
On serde failure, sniff for SSE and aggregate the chunks into a single
JSON, then run the existing converter so clients still receive a valid
non-stream response.
- chat_sse_to_response_value: aggregate chat.completion.chunk SSE
(content / reasoning / refusal / tool_calls / legacy function_call),
tool_calls index-keyed via BTreeMap to avoid unbounded densification,
first-wins finish_reason, message-snapshot override, completeness and
error-event guards; synthesize an id when the upstream omits one
- responses_sse_to_response_value: process the residual trailing block,
tolerating truncation and skipping it once a completed event was seen
- enrich remaining parse failures with content-type / content-encoding /
body-snippet diagnostics
- deflate: try zlib (RFC 9110) before raw; keep the content-encoding
header for unsupported encodings
- gate zero-usage rows on the Claude transform path
- Fold local routing toggle, model mapping, reasoning overrides and custom
User-Agent into a single collapsible advanced section, mirroring the
Claude form (auto-expands when UA is set or local routing is enabled)
- Custom User-Agent becomes configurable for native Responses providers;
it was previously reachable only when openai_chat routing was on
- Collapsed hint names local routing as the entry point for Chat
Completions / non-GPT providers
- Backfill all missing codexConfig keys in zh-TW locale
Polish the provider-level User-Agent override UI on the Claude and Codex forms.
- Add a shared CustomUserAgentField (label + input + preset dropdown + live
validation) so both forms stay in sync.
- Provide curated UA presets (Claude Code / Kilo Code families that pass
coding-plan UA whitelists per #3671); the first is Claude Code's real
`claude-cli/x (external, cli)` format. Whitelists gate on the name prefix,
not the version, so static values stay valid across upgrades.
- Expose presets via a dropdown to the right of the input (z-[200] so it
renders above the dialog layers) instead of inline chips.
- Move the field into the existing advanced/reasoning collapsibles.
- userAgent.ts mirrors the backend byte rule (reject only control chars;
non-ASCII is allowed) for a non-blocking inline hint.
- i18n for all four locales (zh/en/ja/zh-TW).
Extract a shared `parse_custom_user_agent` helper in provider.rs returning
`Result<Option<HeaderValue>>`, and reuse it in the forwarder, stream check,
and model fetch paths so detection, forwarding, and model listing all apply
the same provider-level User-Agent. Previously only the forwarder honored it,
so stream check could fail (or model listing 403) on UA-gated upstreams that
the proxy itself handled fine.
- stream_check injects the provider's custom UA on the claude/codex paths and
still skips the GitHub Copilot fingerprint UA.
- model_fetch service + command and the model-fetch.ts wrapper thread an
optional UA through to GET /v1/models.
- runtime callers silently ignore invalid values via `.ok().flatten()`
(no save-time block, so deeplink imports stay lenient).
- add claude-desktop to AppType/KNOWN_APP_TYPES and the dashboard app
filter; it was hidden because its rows looked like pure failure
noise, which was the app_type attribution bug fixed on the backend
- request detail panel now shows the requested model and the pricing
model when they differ from the response model, making route-takeover
bills auditable from the UI
- locale keys added for zh/en/ja/zh-TW
The model mapped for takeover (env mapping, Claude Desktop routes,
Copilot normalization, Codex chat override) was discarded inside the
forwarder, so usage attribution depended entirely on the upstream
echoing it back. When the upstream omitted the model or mirrored the
client alias, kimi/glm tokens were recorded and priced as claude-*
(roughly 5-25x overstatement).
- capture the final outbound model in forward(), return it via
ForwardResult, and store it on the request context
- attribution fallback order is now: upstream echo (empty string
treated as missing) -> outbound model -> client-requested model
- 'request' pricing mode anchors to the outbound model instead of the
pre-mapping client alias; unchanged when no mapping applies
- persist the resolved pricing_model on every usage row
- Claude Desktop rows now log app_type "claude-desktop" on streaming
and transform paths too (was hardcoded "claude", silently dropping
desktop provider pricing overrides and splitting the cost basis by
the stream flag); its global pricing defaults inherit the claude
config since proxy_config only allows claude/codex/gemini rows
- proxy_request_logs: add pricing_model column recording the basis actually
used at write time (NULL = pre-v11 rows, '' = unpriced error rows)
- cost backfill recomputes strictly by the persisted basis; the
request_model fallback now only applies to placeholder models, so
real-but-unpriced takeover rows stay at zero cost until pricing is
added instead of being permanently frozen at the alias's price
- backfill_missing_usage_costs_for_model can locate rows by pricing_model
- usage_daily_rollups: rebuild with request_model + pricing_model in the
primary key so the alias-to-real-model mapping and the pricing basis
survive the 30-day prune; legacy rows migrate with ''
- rollup_and_prune backfills costs before pruning: prune is irreversible
and used to run before the startup backfill, permanently booking
then-unpriced rows as zero
- get_model_stats groups by the effective pricing model
(COALESCE(NULLIF(pricing_model,''), model)) so costs aggregate under
the model whose prices produced them; response-mode behavior unchanged
The Zhipu quota API returns two TOKENS_LIMIT entries whose identity was
inferred by sorting nextResetTime ascending (nearest = five_hour). In the
last hours of each weekly cycle the weekly window resets sooner than the
current 5-hour session window, so the two buckets were swapped exactly
when users check their weekly quota most.
Classify by the explicit unit field instead (3 = hour window -> five_hour,
6 = week window -> weekly_limit; same shape on bigmodel.cn and api.z.ai,
weekly observed with number 7 and 1 so only unit is matched), falling back
to the old reset-time heuristic when the field is missing.
- Replace the fixed Zap glyph in the usage hero with the selected app's
brand icon via a new AppGlyph component, reusing APP_ICON_MAP
(cloneElement scales 14px -> 20px); falls back to Zap for the "all" view.
- Recolor the Codex title theme from emerald to neutral gray to match
OpenAI's monochrome branding. neutral-500/10 stays visible in both
light and dark modes, unlike a flat black tint.
Codex /responses requests routed to text-only OpenAI-chat upstreams
(e.g. DeepSeek deepseek-v4-flash) failed with HTTP 400 "unknown variant
image_url" when images were sent: the responses->chat conversion turns
input_image items into image_url blocks the model rejects. The media
rectifier previously covered only the Claude adapter, so neither the
proactive strip nor the reactive retry fired for Codex.
- media_retry_should_trigger: accept "Codex" adapter, not just "Claude"
- contains_image_blocks / replace_images: also scan responses `input`
(input_image) in addition to chat `messages`
- is_image_block_type: match image | image_url | input_image
- is_unsupported_image_error: add "unknown variant" hint for the
deserialize error
- forward(): proactively run apply_media_prevention for Codex after the
responses->chat conversion
Proactively strips images for known text-only models (heuristic on by
default) and reactively retries with images replaced on upstream
image-unsupported errors. Adds tests for chat image_url, codex
input_image, the reactive trigger, and the deserialize error match.
Builds on #2774 (which fixed cache_read for the streaming openai_chat path).
Two gaps remained, both double-counting cache tokens when a Claude client
meters as app_type="claude" (input_includes_cache_read=false):
1. cache_read was still added to input on the non-streaming openai_chat path
(transform.rs openai_to_anthropic) and the whole openai_responses family
(transform_responses.rs build_anthropic_usage_from_responses, covering the
non-streaming call site and both streaming_responses call sites).
2. cache_creation was never subtracted on any converted path, including the
streaming openai_chat path #2774 had already touched. Claude billing treats
cache_creation as a separate bucket, so an inclusive upstream carrying a
direct cache_creation_input_tokens field billed it twice.
All four metering points now compute:
input = prompt_tokens - cache_read - cache_creation
restoring the invariant input + cache_read + cache_creation == prompt_tokens.
Pure OpenAI upstreams are unaffected (no cache_creation concept/field).
Tests: update direct-cache assertions (40->20), add a streaming conservation
regression test, and pin prompt<cache underflow (saturating clamp to 0) for all
three metering functions. cargo test 1573 pass, clippy clean.
Note: fix is forward-only; historical rows are not recomputed (cost is frozen at
log time and app_type="claude" mixes native + converted rows).
Audited all proxy format-conversion paths (Chat<->Message, Chat<->Response,
Gemini<->Message) for usage/cache metering. Five issues found and fixed.
The dedup mechanism (request_id PK, proxy/session source isolation) is
untouched, so no double-counting is introduced.
- A (Claude + openai_chat, streaming): inject stream_options.include_usage
so OpenAI-compatible upstreams emit usage in the SSE tail. Without it the
converted Anthropic message_delta was all-zero and the whole request's
input/output/cache was dropped. Same root cause as the already-fixed
Codex Chat path; the injection is extracted into a shared helper
(transform::inject_openai_stream_include_usage) reused by both paths.
- C (Claude + gemini_native): subtract cachedContentTokenCount from
input_tokens in build_anthropic_usage so input becomes fresh input
(Anthropic semantics). Previously the cache-hit tokens were billed twice
because this path meters as app_type="claude" (input_includes_cache_read
= false) while Gemini's promptTokenCount includes the cache.
- D (Codex + openai_chat, streaming): gate log_usage on
has_billable_tokens() to skip the synthetic all-zero usage the converter
emits when a non-compliant upstream omits usage, preventing empty-row
request-count inflation.
- P2 (from_claude_stream_events): use has_billable_tokens() for the return
gate instead of input>0||output>0, so a fully-cached streamed request
(cache_read>0, input==output==0) is still recorded. Affects all
Claude-streaming paths, not just Gemini.
- P3 (Codex Chat->Responses, non-streaming): apply the same
has_billable_tokens() filter the streaming branch got, since the
synthesized all-zero usage makes from_codex_response return Some and
bypass the `if let Some` guard.
Add TokenUsage::has_billable_tokens() as the unified predicate. New tests
cover include_usage injection, gemini input subtraction, the gate itself,
cache-only stream recording, and synthetic all-zero codex usage.
Full lib suite: 1569 passed.
The local session-log scanner dropped any assistant message that lacked
a stop_reason or had output_tokens==0. Claude Code Workflow / sub-agent
fan-out frequently produces messages that only wrote a message_start
snapshot (output=1, stop_reason=None) without a final block, yet their
input + cache_read + cache_creation tokens are already billed by
Anthropic (charged once the request is accepted). Dropping them
under-counted usage by ~4.1% overall, 92% concentrated in
workflow/subagent transcripts.
Replace the stop_reason/output gate with a billable-token check (any of
input/output/cache_read/cache_creation > 0). The per-message-id dedup
selection is unchanged, and request_id = "session:"+msg_id PRIMARY KEY
with INSERT OR IGNORE keeps each message single-inserted, so relaxing
the gate cannot double-count. Add a regression test covering a
stop_reason-less message with real cache cost plus an all-zero skip.
This is the parser-layer half of the Workflow under-counting fixed at
the collector layer in 8d332925.
collect_jsonl_files only walked <project>/<session>/subagents/*.jsonl,
so it missed Workflow sub-agent transcripts which live one level deeper
at subagents/workflows/wf_*/agent-*.jsonl. As a result all Workflow
token usage was invisible to the no-proxy session-log accounting.
Descend into subagents/workflows/wf_*/ as well, via a new
push_jsonl_children helper that keeps the fixed-depth, no-recursion
design. journal.jsonl carries no assistant rows so it is skipped at
parse time and needs no filename special-casing. Existing dedup
(request_id PK + INSERT OR IGNORE + should_skip_session_insert) keeps
the next sync's backfill idempotent.
Add test_collect_jsonl_files_includes_workflow_subagents.
The JS-script usage path resolved {{apiKey}}/{{baseUrl}} with env-only
field guessing, so apps that store credentials elsewhere (Codex:
auth.OPENAI_API_KEY + config.toml base_url) always got empty values and
custom-template queries failed despite a fully configured provider.
- query_usage / test_usage_script now delegate to
Provider::resolve_usage_credentials, the same per-app resolver used by
the native balance/coding-plan path and mirrored by the frontend
getProviderCredentials; explicit non-empty script values still win
- test_usage_script loads the provider and applies the same fallback,
so testing matches what a saved script does
- the custom-template variable preview shows the effective values
(script overrides first, then provider config) instead of always
showing provider credentials
- extract_codex_base_url documents and test-locks the frontend-mirror
invariant: non-active [model_providers.*] sections are never read
Reworked from the original patch to reuse the existing resolver instead
of duplicating per-app extraction.
Co-authored-by: Jason <farion1231@gmail.com>
* fix: prevent duplicate YAML keys in Hermes config
Three changes in hermes_config.rs:
1. deduplicate_top_level_keys() - scan and remove duplicate top-level
keys before YAML parsing, preventing "duplicate entry" parse errors
2. remove_all_sections() - helper to strip all occurrences of a given
top-level key from raw YAML text
3. replace_yaml_section() now calls remove_all_sections() on the
remainder after replacing the primary occurrence, preventing
duplicate sections from accumulating on repeated writes
Fixes the issue where mcp_servers (or any top-level key) gets
duplicated in config.yaml, causing "Failed to parse Hermes config
as YAML: duplicate entry with key" errors.
Co-Authored-By: que3sui <204201112+que3sui@users.noreply.github.com>
* fix: handle CRLF and LF line endings in top-level key deduplication
is_top_level_key_line only accepted empty, space, or tab after the colon,
but deduplicate_top_level_keys uses split_inclusive('\n'), so lines end
with \n (LF) or \r\n (CRLF). Without accepting \r and \n as valid
post-colon characters, the dedup safety net never activates.
Add \r and \n checks to is_top_level_key_line, and three tests covering
LF, CRLF, and first-occurrence preservation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* refactor(hermes): keep last occurrence when healing duplicate YAML keys
Reworks the healing layers on top of the CRLF root-cause fix:
- deduplicate_top_level_keys: keep the LAST occurrence of each duplicated
key instead of the first. Duplicates come from section replacement
degrading into appends (#3633), so the last block is the newest data --
and Hermes itself reads the config with PyYAML, whose duplicate-key
semantics are last-wins. Keeping the first occurrence would silently
roll users back to stale config and diverge from what Hermes runs with.
Healthy files take a fast path and are returned untouched.
- Drop the unused dup_key variable (fails cargo clippy -- -D warnings,
which CI enforces).
- replace_yaml_section: clean residual duplicate sections from the
remainder via remove_all_sections; values come from the keep-last
healed read, so dropping all stale on-disk copies loses nothing.
- Add regression tests for the actual root cause (find/replace on CRLF
input must replace in place, not append), keep-last semantics,
identity on healthy files, end-to-end heal-then-parse, and duplicate
cleanup on write.
Fixes#3633#2973#2529#3310#3762
---------
Co-authored-by: que3sui <204201112+que3sui@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Jason <farion1231@gmail.com>
Problem: Kimi and Moonshot preset links were user-clickable without the cc-switch affiliate query.\n\nDecision: Update only UI-facing preset website/API-key links and leave API request endpoints untouched.\n\nChange: Add aff=cc-switch to Kimi/Moonshot websiteUrl values and Codex/OpenCode API-key links.
Co-authored-by: xumingyuan <xumingyuan@msh.team>
Move the CCSub preset to sit right after DouBaoSeed, at the end of the
partner block and before the first non-partner provider, so its position
is consistent across all six apps:
- Codex / OpenCode: moved up from the 2nd slot (between Shengsuanyun and
the next partner) to the block tail
- OpenClaw / Hermes: moved up from the aggregator section to the block tail
- Claude / Claude Desktop: already at the block tail
Also add the missing CHANGELOG entry for the CCSub preset, and drop the
provider preset order test that enforced a now-unneeded ordering invariant.
Add CCSub, a multi-model aggregator partner, as a preset for Claude, Codex, OpenCode, OpenClaw, Claude Desktop, and Hermes. Each preset carries the referral signup link as apiKeyUrl.
- Register the ccsub icon via iconUrls (1.1MB SVG URL import) + metadata
- Add partnerPromotion copy in zh/en/ja
- List CCSub in the sponsor section of all README locales
- Use gpt-5.5 and gemini-3.1-pro as the OpenAI/Gemini model ids
Add the v3.16.2 CHANGELOG entry covering the 41 commits since v3.16.1,
bump the version across package.json, tauri.conf.json, Cargo.toml, and
Cargo.lock, and add trilingual (zh/en/ja) release notes.
* fix(proxy): strip cache_control from OpenAI format conversion (#3805)
- Remove cache_control passthrough from system messages, text blocks,
and tools to prevent 400 errors on strict OpenAI-compatible endpoints
- Always simplify single text block content to plain string format
- Fixes two format conversion bugs reported in issue #3805
* fix(proxy): apply cargo fmt to fix CI formatting check
The proxy-takeover block previously fell back to the isOfficial heuristic
(empty base_url / missing key) when category was absent. That misjudged
custom providers whose endpoint lives in meta or whose fields are simply
unfilled: their switch button got disabled, making users think the config
was broken. That extra UI block was also "virtual" — the executor in
useProviderActions only ever honored category === "official", so the
front end blocked more than the backend would enforce.
Gate the block solely on explicit category === "official", matching the
executor and unifying both verdicts on a single source of truth.
Also rework the blocked-state UI:
- drop the red "blocked" badge for a plain disabled Enable button
- move title/cursor onto a wrapper span (disabled buttons set
pointer-events:none, so an on-button title/cursor never fired)
- replace the account-ban warning tooltip with a lighter hint
(provider.blockedByProxyHint), four locales kept in sync
Convert Responses input_file (requiring file_id or file_data, never file_url which Chat file parts do not support) and input_audio parts into their Chat Completions equivalents, and handle top-level input_* items that previously fell through and were dropped, clearing stale pending reasoning for non-assistant messages.
Replace the unconditional finalize at chat-to-responses stream end with a three-way guard: complete normally when finish_reason or [DONE] arrived, emit an incomplete response when substantive output exists without a finish_reason, and emit a failed (stream_truncated) event for empty truncation instead of masking it as completed. Also propagate late-arriving reasoning_content onto still-active tool-call items.
Generalize the cross-turn reasoning cache in codex chat history from function_call only to the full tool-call triad (function_call, custom_tool_call, tool_search_call) and their *_output counterparts, so apply_patch and tool-search calls keep their reasoning_content when restored via previous_response_id.
Switch website/apiKey URLs to sssaicodeapi.com and replace base URL
nodes with node-hk.sssaicodeapi.com (default), node-hk.sssaiapi.com,
and node-cf.sssaicodeapi.com across all 7 app presets.
When listen_port is 0 the OS assigns the port at bind time, so the
configured value can no longer be trusted for building takeover URLs.
- server: read listener.local_addr() after bind and propagate the
actual port to the global proxy port, status, and ProxyServerInfo
- services: start the proxy before takeover when port is 0 so live
configs get the real port instead of :0, and persist the resolved
port back to the DB for DB-only URL paths; stop the pre-started
server on any takeover failure
- claude_desktop: reject an unresolved :0 port instead of emitting a
broken gateway URL
- build_proxy_urls: prefer the running server's port and error out if
the port is still 0
Add tests for takeover with an ephemeral port and the claude_desktop
:0 rejection; switch existing codex takeover tests to an ephemeral
port for isolation.
* feat(proxy): add GET /v1/models endpoint for Codex CLI reachability check
Codex CLI probes GET /v1/models at startup. Without this endpoint the proxy
returns 404, causing Codex to fail before any request reaches the upstream
LLM.
Return an OpenAI-compatible model list derived from the cc-switch–managed
model catalog file.
Fixes#3812
* fix(proxy): return Codex catalog schema from /v1/models
Codex deserializes the response as a catalog with a top-level `models`
field, not the OpenAI `{"object":"list","data":[...]}` envelope.
Return the catalog file content directly so the format matches what
Codex expects.
Co-authored-by: Codex review bot
* fix(proxy): guard /v1/models against serving stale catalog
Only return the model catalog when config.toml still references it via
`model_catalog_json`. After switching to a provider without a custom
catalog, the old file lingers on disk — serving it unconditionally
would advertise the previous provider's models to Codex.
Co-authored-by: Codex review bot
* fix(proxy): match relative model_catalog_json in stale-guard
cc-switch writes `model_catalog_json = "cc-switch-model-catalog.json"`
(relative) via set_codex_model_catalog_json_field. Match on the
filename constant rather than the absolute path so the guard works
with both relative and absolute paths.
Co-authored-by: Codex review bot
* fix(proxy): parse model_catalog_json field instead of substring match
Replace raw config_text.contains() with proper TOML field parsing so
commented-out lines and stray mentions of the filename in other fields
don't defeat the stale guard. Also switch from contains() to exact
filename match (Path::new(val).file_name() == Some(...)) to stay
consistent with resolve_cc_switch_catalog_path in codex_config.rs.
Add log::debug! when the guard blocks serving so the operator can
distinguish "no models configured" from "guard blocked stale catalog".
* refactor(proxy): reuse resolve_cc_switch_catalog_path in handle_models
Replace the inline config.toml parsing and filename match in
handle_models with the existing resolve_cc_switch_catalog_path helper
(now pub(crate)). This removes the duplicated stale-guard logic, keeps
a single source of truth for catalog-path ownership, and makes the
handler honor absolute model_catalog_json paths the same way Codex
live-setting import does.
---------
Co-authored-by: Jason <farion1231@gmail.com>
* Fix Codex VS Code session previews
* fix(codex): use last IDE request heading for session previews
A markdown heading inside the active selection / open file could precede the real injected request, so matching the first "## My request for Codex:" heading picked selection content instead of the user prompt. Scan for the last matching heading (the IDE injects the real request as the final section) on both the Rust title path and the frontend TOC preview path.
Add regression tests for the selection-heading case, and pin the known best-effort limitation when the request body itself repeats the heading.
---------
Co-authored-by: Jason <farion1231@gmail.com>
On Windows, Path::strip_prefix produces backslash-separated relative
paths. The update-check matching logic uses rsplit('/') to extract the
install name, so subdirectory skills (e.g. skills/my-skill) never
matched and updates were silently skipped. Replace backslashes with
forward slashes when building the directory string.
Some Anthropic-compatible SSE providers (e.g. qwen, minimax) report the
full context (fresh + cached) as input_tokens in message_start, double
counting the cached portion that is also reported in
cache_read_input_tokens. This inflated the cacheable-input denominator
and pushed the displayed cache hit rate artificially low.
When a message_delta carries a smaller positive input_tokens, prefer it
over the message_start value and adopt the cache counts from the same
usage block to avoid double counting; fall back to the start cache
values when the delta omits them. Native Claude (no input in delta) and
OpenRouter-converted (input only in delta) paths are unchanged.
Refs #3580
APINebula is an OpenAI-compatible relay (its base URL ends in /v1, matching
its Codex/OpenClaw/Hermes presets), but the OpenCode preset loaded the
@ai-sdk/openai package, which targets the OpenAI Responses API and fails
against chat-completions-only upstreams. Switch the npm field to
@ai-sdk/openai-compatible so requests use the OpenAI Chat Completions format.
Fixes#3701.
`query_zhipu` was hard-coded to `https://api.z.ai`, so a user who
configured the mainland China preset (`Zhipu GLM` on
`open.bigmodel.cn`) could not retrieve usage once the international
endpoint became unreachable from their network (or vice versa).
The two endpoints share the same quota path (`/api/monitor/usage/quota/limit`)
and return JSON in the same shape, and — crucially — each user only
ever uses one of them: the quota host is the same host they're already
running coding on. So we can route by the configured `base_url` and
skip the cross-host fallback entirely.
What this PR changes
--------------------
A single helper that maps the user's `base_url` to the matching quota
host, and `query_zhipu` rebuilt to take `base_url` and pick the right
host:
fn zhipu_quota_base(base_url: &str) -> &'static str {
if base_url.contains("bigmodel.cn") {
"https://open.bigmodel.cn"
} else {
"https://api.z.ai"
}
}
async fn query_zhipu(base_url: &str, api_key: &str) -> SubscriptionQuota {
let url = format!(
"{}/api/monitor/usage/quota/limit",
zhipu_quota_base(base_url),
);
// ... original 401/403 -> Expired / make_error / parse path, unchanged
}
The dispatcher already distinguishes `ZhipuCn` from `ZhipuEn` via
`detect_provider()` and routes the call through
`query_zhipu(base_url, api_key)` in the same match arm.
Why no cross-host fallback
--------------------------
Farion's review pointed out that adding a fallback would be
over-engineered and actively harmful:
1. Reachability is determined by the preset the user chose. Their
configured host is the host they are already using to run coding;
if it were unreachable, the user could not have reached the
"query usage" step at all.
2. The fallback path required distinguishing "both 401/403" (genuine
bad key) from "one 401/403 + one network error" (regional block),
which silently misclassified the second case as a generic query
failure and hid the upstream "Session expired" UX for invalid
keys.
3. It also cost the worst-case ~10s+10s≈20s serial timeout for users
on a working primary.
With the URL-based routing in place, 401/403 returns to the original
`CredentialStatus::Expired` semantics — same UX as `query_kimi` and
`query_minimax`.
Files changed
-------------
- `src-tauri/src/services/coding_plan.rs` — 1 file, +35 / -20
Testing
-------
- 3 new `zhipu_quota_base_*` routing tests
- 15 existing `coding_plan` parser tests still pass
- `cargo fmt --check` clean
- `cargo clippy --lib --no-deps -- -D warnings` clean
Co-authored-by: Yongmao Luo <yongmao.luo@columbia.edu>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
When previous stop_with_restore() failed to restore the user's original
Live (e.g. app crash mid-stop, settings.json unwritable, or any pre-existing
state where Live carries the proxy placeholders), the next
start_with_takeover would read the still-placeholder Live and overwrite the
good backup row with the proxy config itself. After that, every subsequent
stop would restore the proxy placeholder back to Live — making the proxy
toggle a no-op and leaving the client pinned at http://127.0.0.1:15721.
Fix: in both backup write paths (`backup_live_configs` and
`backup_live_config_strict`) detect that Live is already a proxy
placeholder and skip the save, preserving any existing good backup. In
`restore_live_config_for_app_with_fallback_inner`, detect the same
condition in the parsed backup and fall through to the existing
SSOT (current provider DB) path that was added in c3d810a.
Both sides share a new `live_has_proxy_placeholder_for_app` dispatch
helper so the placeholder check stays in lockstep with the existing
per-app detection functions.
Co-authored-by: Yongmao Luo <yongmao.luo@columbia.edu>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>