Commit Graph

1917 Commits

  • feat(presets): add Unity2.ai partner provider across seven apps
    Add Unity2.ai, a high-performance AI API relay partner, as a preset for
    Claude, Codex, Gemini, OpenCode, OpenClaw, Claude Desktop, and Hermes.
    Each preset carries the referral signup link as apiKeyUrl.
    
    - Register the unity2 icon via iconUrls (PNG URL import) + metadata
    - Add partnerPromotion copy in zh/en/ja/zh-TW; backfill the missing
      zh-TW ccsub entry
    - List Unity2.ai in the sponsor section of all README locales
    - Codex uses the bare base URL (gateway exposes /responses at root);
      OpenCode/OpenClaw/Hermes use the /v1 chat-completions endpoint with
      gpt-5.5 as the only preset model
    - Trim CCSub OpenCode/OpenClaw/Hermes model lists to gpt-5.5 to match
    - Normalize unity2/ccsub banners to the standard 2.41 aspect ratio
  • fix(proxy): aggregate mislabeled SSE bodies in transform fallback (#2234)
    The Claude/Codex format-transform non-stream branch returned an opaque 422
    "Failed to parse upstream response" whenever a 2xx upstream body was not
    valid JSON. The common case: MaaS gateways force-stream a stream:false
    request and return an SSE body with a non-SSE Content-Type, defeating the
    header-only is_sse() check.
    
    On serde failure, sniff for SSE and aggregate the chunks into a single
    JSON, then run the existing converter so clients still receive a valid
    non-stream response.
    
    - chat_sse_to_response_value: aggregate chat.completion.chunk SSE
      (content / reasoning / refusal / tool_calls / legacy function_call),
      tool_calls index-keyed via BTreeMap to avoid unbounded densification,
      first-wins finish_reason, message-snapshot override, completeness and
      error-event guards; synthesize an id when the upstream omits one
    - responses_sse_to_response_value: process the residual trailing block,
      tolerating truncation and skipping it once a completed event was seen
    - enrich remaining parse failures with content-type / content-encoding /
      body-snippet diagnostics
    - deflate: try zlib (RFC 9110) before raw; keep the content-encoding
      header for unsupported encodings
    - gate zero-usage rows on the Claude transform path
  • feat(provider-form): consolidate codex form into advanced options section
    - Fold local routing toggle, model mapping, reasoning overrides and custom
      User-Agent into a single collapsible advanced section, mirroring the
      Claude form (auto-expands when UA is set or local routing is enabled)
    - Custom User-Agent becomes configurable for native Responses providers;
      it was previously reachable only when openai_chat routing was on
    - Collapsed hint names local routing as the entry point for Chat
      Completions / non-GPT providers
    - Backfill all missing codexConfig keys in zh-TW locale
  • feat(provider-form): custom User-Agent presets dropdown in advanced settings
    Polish the provider-level User-Agent override UI on the Claude and Codex forms.
    
    - Add a shared CustomUserAgentField (label + input + preset dropdown + live
      validation) so both forms stay in sync.
    - Provide curated UA presets (Claude Code / Kilo Code families that pass
      coding-plan UA whitelists per #3671); the first is Claude Code's real
      `claude-cli/x (external, cli)` format. Whitelists gate on the name prefix,
      not the version, so static values stay valid across upgrades.
    - Expose presets via a dropdown to the right of the input (z-[200] so it
      renders above the dialog layers) instead of inline chips.
    - Move the field into the existing advanced/reasoning collapsibles.
    - userAgent.ts mirrors the backend byte rule (reject only control chars;
      non-ASCII is allowed) for a non-blocking inline hint.
    - i18n for all four locales (zh/en/ja/zh-TW).
  • feat(proxy): honor custom User-Agent across stream check and model fetch
    Extract a shared `parse_custom_user_agent` helper in provider.rs returning
    `Result<Option<HeaderValue>>`, and reuse it in the forwarder, stream check,
    and model fetch paths so detection, forwarding, and model listing all apply
    the same provider-level User-Agent. Previously only the forwarder honored it,
    so stream check could fail (or model listing 403) on UA-gated upstreams that
    the proxy itself handled fine.
    
    - stream_check injects the provider's custom UA on the claude/codex paths and
      still skips the GitHub Copilot fingerprint UA.
    - model_fetch service + command and the model-fetch.ts wrapper thread an
      optional UA through to GET /v1/models.
    - runtime callers silently ignore invalid values via `.ok().flatten()`
      (no save-time block, so deeplink imports stay lenient).
  • fix: omit customUserAgent when provider category is official
    Stale custom UA values from non-official presets were persisted even
    after switching to an official preset, silently altering request headers.
  • feat(usage): claude-desktop filter and pricing-model audit display
    - add claude-desktop to AppType/KNOWN_APP_TYPES and the dashboard app
      filter; it was hidden because its rows looked like pure failure
      noise, which was the app_type attribution bug fixed on the backend
    - request detail panel now shows the requested model and the pricing
      model when they differ from the response model, making route-takeover
      bills auditable from the UI
    - locale keys added for zh/en/ja/zh-TW
  • fix(proxy): bill route-takeover traffic by the real upstream model
    The model mapped for takeover (env mapping, Claude Desktop routes,
    Copilot normalization, Codex chat override) was discarded inside the
    forwarder, so usage attribution depended entirely on the upstream
    echoing it back. When the upstream omitted the model or mirrored the
    client alias, kimi/glm tokens were recorded and priced as claude-*
    (roughly 5-25x overstatement).
    
    - capture the final outbound model in forward(), return it via
      ForwardResult, and store it on the request context
    - attribution fallback order is now: upstream echo (empty string
      treated as missing) -> outbound model -> client-requested model
    - 'request' pricing mode anchors to the outbound model instead of the
      pre-mapping client alias; unchanged when no mapping applies
    - persist the resolved pricing_model on every usage row
    - Claude Desktop rows now log app_type "claude-desktop" on streaming
      and transform paths too (was hardcoded "claude", silently dropping
      desktop provider pricing overrides and splitting the cost basis by
      the stream flag); its global pricing defaults inherit the claude
      config since proxy_config only allows claude/codex/gemini rows
  • feat(usage): persist pricing basis and takeover dimensions in storage (schema v11)
    - proxy_request_logs: add pricing_model column recording the basis actually
      used at write time (NULL = pre-v11 rows, '' = unpriced error rows)
    - cost backfill recomputes strictly by the persisted basis; the
      request_model fallback now only applies to placeholder models, so
      real-but-unpriced takeover rows stay at zero cost until pricing is
      added instead of being permanently frozen at the alias's price
    - backfill_missing_usage_costs_for_model can locate rows by pricing_model
    - usage_daily_rollups: rebuild with request_model + pricing_model in the
      primary key so the alias-to-real-model mapping and the pricing basis
      survive the 30-day prune; legacy rows migrate with ''
    - rollup_and_prune backfills costs before pruning: prune is irreversible
      and used to run before the startup backfill, permanently booking
      then-unpriced rows as zero
    - get_model_stats groups by the effective pricing model
      (COALESCE(NULLIF(pricing_model,''), model)) so costs aggregate under
      the model whose prices produced them; response-mode behavior unchanged
  • fix(coding-plan): classify Zhipu quota windows by unit field instead of reset-time order (#3036)
    The Zhipu quota API returns two TOKENS_LIMIT entries whose identity was
    inferred by sorting nextResetTime ascending (nearest = five_hour). In the
    last hours of each weekly cycle the weekly window resets sooner than the
    current 5-hour session window, so the two buckets were swapped exactly
    when users check their weekly quota most.
    
    Classify by the explicit unit field instead (3 = hour window -> five_hour,
    6 = week window -> weekly_limit; same shape on bigmodel.cn and api.z.ai,
    weekly observed with number 7 and 1 so only unit is matched), falling back
    to the old reset-time heuristic when the field is missing.
  • feat(usage): refresh model pricing seed — add Fable 5 + 8 models, fix 28 prices
    Full audit of seed_model_pricing against current official vendor pricing.
    
    New models: claude-fable-5 (10/50), grok-4.3, step-3.7-flash,
    mistral-medium-3.5, mistral-small-4, devstral-small-2-2512, magistral-small,
    qwen3.7-max, qwen3.7-plus.
    
    Price fixes (Chinese vendors standardized on official list price, CNY/~7.14):
    - GLM 4.6/4.7 -> Z.ai official 0.6/2.2/0.11 (were reseller/OpenRouter rates)
    - Grok 4.20 reasoning/non-reasoning -> 1.25/2.50 (xAI price cut)
    - MiMo v2.5 / v2.5-pro / v2-pro -> post-2026-05-27 rates + cache
    - Doubao Seed 2.0 lite corrected + cache-hit prices across the family
    - Kimi k2.5 output 3.00, MiniMax m2.5 input 0.15, Mistral devstral-2 output 2
    - Qwen 3.5/3.6-plus + coder-plus/flash cache_read (official 20%-of-input rule)
    
    Each fix updates the seed value (fresh installs) and adds an old->new guard to
    repair_current_model_pricing (existing DBs; won't clobber user-edited rows).
  • feat(usage): app-aware hero icon and neutral Codex theme
    - Replace the fixed Zap glyph in the usage hero with the selected app's
      brand icon via a new AppGlyph component, reusing APP_ICON_MAP
      (cloneElement scales 14px -> 20px); falls back to Zap for the "all" view.
    - Recolor the Codex title theme from emerald to neutral gray to match
      OpenAI's monochrome branding. neutral-500/10 stays visible in both
      light and dark modes, unlike a flat black tint.
  • fix(proxy): extend image rectifier to Codex /responses text-only path
    Codex /responses requests routed to text-only OpenAI-chat upstreams
    (e.g. DeepSeek deepseek-v4-flash) failed with HTTP 400 "unknown variant
    image_url" when images were sent: the responses->chat conversion turns
    input_image items into image_url blocks the model rejects. The media
    rectifier previously covered only the Claude adapter, so neither the
    proactive strip nor the reactive retry fired for Codex.
    
    - media_retry_should_trigger: accept "Codex" adapter, not just "Claude"
    - contains_image_blocks / replace_images: also scan responses `input`
      (input_image) in addition to chat `messages`
    - is_image_block_type: match image | image_url | input_image
    - is_unsupported_image_error: add "unknown variant" hint for the
      deserialize error
    - forward(): proactively run apply_media_prevention for Codex after the
      responses->chat conversion
    
    Proactively strips images for known text-only models (heuristic on by
    default) and reactively retries with images replaced on upstream
    image-unsupported errors. Adds tests for chat image_url, codex
    input_image, the reactive trigger, and the deserialize error match.
  • fix(proxy): exclude cache_read and cache_creation from input on Claude←OpenAI paths
    Builds on #2774 (which fixed cache_read for the streaming openai_chat path).
    Two gaps remained, both double-counting cache tokens when a Claude client
    meters as app_type="claude" (input_includes_cache_read=false):
    
    1. cache_read was still added to input on the non-streaming openai_chat path
       (transform.rs openai_to_anthropic) and the whole openai_responses family
       (transform_responses.rs build_anthropic_usage_from_responses, covering the
       non-streaming call site and both streaming_responses call sites).
    
    2. cache_creation was never subtracted on any converted path, including the
       streaming openai_chat path #2774 had already touched. Claude billing treats
       cache_creation as a separate bucket, so an inclusive upstream carrying a
       direct cache_creation_input_tokens field billed it twice.
    
    All four metering points now compute:
      input = prompt_tokens - cache_read - cache_creation
    restoring the invariant input + cache_read + cache_creation == prompt_tokens.
    Pure OpenAI upstreams are unaffected (no cache_creation concept/field).
    
    Tests: update direct-cache assertions (40->20), add a streaming conservation
    regression test, and pin prompt<cache underflow (saturating clamp to 0) for all
    three metering functions. cargo test 1573 pass, clippy clean.
    
    Note: fix is forward-only; historical rows are not recomputed (cost is frozen at
    log time and app_type="claude" mixes native + converted rows).
  • fix(proxy): correct usage accounting on format-conversion paths
    Audited all proxy format-conversion paths (Chat<->Message, Chat<->Response,
    Gemini<->Message) for usage/cache metering. Five issues found and fixed.
    The dedup mechanism (request_id PK, proxy/session source isolation) is
    untouched, so no double-counting is introduced.
    
    - A (Claude + openai_chat, streaming): inject stream_options.include_usage
      so OpenAI-compatible upstreams emit usage in the SSE tail. Without it the
      converted Anthropic message_delta was all-zero and the whole request's
      input/output/cache was dropped. Same root cause as the already-fixed
      Codex Chat path; the injection is extracted into a shared helper
      (transform::inject_openai_stream_include_usage) reused by both paths.
    
    - C (Claude + gemini_native): subtract cachedContentTokenCount from
      input_tokens in build_anthropic_usage so input becomes fresh input
      (Anthropic semantics). Previously the cache-hit tokens were billed twice
      because this path meters as app_type="claude" (input_includes_cache_read
      = false) while Gemini's promptTokenCount includes the cache.
    
    - D (Codex + openai_chat, streaming): gate log_usage on
      has_billable_tokens() to skip the synthetic all-zero usage the converter
      emits when a non-compliant upstream omits usage, preventing empty-row
      request-count inflation.
    
    - P2 (from_claude_stream_events): use has_billable_tokens() for the return
      gate instead of input>0||output>0, so a fully-cached streamed request
      (cache_read>0, input==output==0) is still recorded. Affects all
      Claude-streaming paths, not just Gemini.
    
    - P3 (Codex Chat->Responses, non-streaming): apply the same
      has_billable_tokens() filter the streaming branch got, since the
      synthesized all-zero usage makes from_codex_response return Some and
      bypass the `if let Some` guard.
    
    Add TokenUsage::has_billable_tokens() as the unified predicate. New tests
    cover include_usage injection, gemini input subtraction, the gate itself,
    cache-only stream recording, and synthetic all-zero codex usage.
    Full lib suite: 1569 passed.
  • fix(usage): import billable session messages without stop_reason
    The local session-log scanner dropped any assistant message that lacked
    a stop_reason or had output_tokens==0. Claude Code Workflow / sub-agent
    fan-out frequently produces messages that only wrote a message_start
    snapshot (output=1, stop_reason=None) without a final block, yet their
    input + cache_read + cache_creation tokens are already billed by
    Anthropic (charged once the request is accepted). Dropping them
    under-counted usage by ~4.1% overall, 92% concentrated in
    workflow/subagent transcripts.
    
    Replace the stop_reason/output gate with a billable-token check (any of
    input/output/cache_read/cache_creation > 0). The per-message-id dedup
    selection is unchanged, and request_id = "session:"+msg_id PRIMARY KEY
    with INSERT OR IGNORE keeps each message single-inserted, so relaxing
    the gate cannot double-count. Add a regression test covering a
    stop_reason-less message with real cache cost plus an all-zero skip.
    
    This is the parser-layer half of the Workflow under-counting fixed at
    the collector layer in 8d332925.
  • fix(usage): count Claude Code Workflow sub-agent token usage
    collect_jsonl_files only walked <project>/<session>/subagents/*.jsonl,
    so it missed Workflow sub-agent transcripts which live one level deeper
    at subagents/workflows/wf_*/agent-*.jsonl. As a result all Workflow
    token usage was invisible to the no-proxy session-log accounting.
    
    Descend into subagents/workflows/wf_*/ as well, via a new
    push_jsonl_children helper that keeps the fixed-depth, no-recursion
    design. journal.jsonl carries no assistant rows so it is skipped at
    parse time and needs no filename special-casing. Existing dedup
    (request_id PK + INSERT OR IGNORE + should_skip_session_insert) keeps
    the next sync's backfill idempotent.
    
    Add test_collect_jsonl_files_includes_workflow_subagents.
  • fix: usage script provider credential resolution (#1479)
    The JS-script usage path resolved {{apiKey}}/{{baseUrl}} with env-only
    field guessing, so apps that store credentials elsewhere (Codex:
    auth.OPENAI_API_KEY + config.toml base_url) always got empty values and
    custom-template queries failed despite a fully configured provider.
    
    - query_usage / test_usage_script now delegate to
      Provider::resolve_usage_credentials, the same per-app resolver used by
      the native balance/coding-plan path and mirrored by the frontend
      getProviderCredentials; explicit non-empty script values still win
    - test_usage_script loads the provider and applies the same fallback,
      so testing matches what a saved script does
    - the custom-template variable preview shows the effective values
      (script overrides first, then provider config) instead of always
      showing provider credentials
    - extract_codex_base_url documents and test-locks the frontend-mirror
      invariant: non-active [model_providers.*] sections are never read
    
    Reworked from the original patch to reuse the existing resolver instead
    of duplicating per-app extraction.
    
    Co-authored-by: Jason <farion1231@gmail.com>
  • fix: prevent duplicate YAML keys in Hermes config (#3267)
    * fix: prevent duplicate YAML keys in Hermes config
    
    Three changes in hermes_config.rs:
    1. deduplicate_top_level_keys() - scan and remove duplicate top-level
       keys before YAML parsing, preventing "duplicate entry" parse errors
    2. remove_all_sections() - helper to strip all occurrences of a given
       top-level key from raw YAML text
    3. replace_yaml_section() now calls remove_all_sections() on the
       remainder after replacing the primary occurrence, preventing
       duplicate sections from accumulating on repeated writes
    
    Fixes the issue where mcp_servers (or any top-level key) gets
    duplicated in config.yaml, causing "Failed to parse Hermes config
    as YAML: duplicate entry with key" errors.
    
    Co-Authored-By: que3sui <204201112+que3sui@users.noreply.github.com>
    
    * fix: handle CRLF and LF line endings in top-level key deduplication
    
    is_top_level_key_line only accepted empty, space, or tab after the colon,
    but deduplicate_top_level_keys uses split_inclusive('\n'), so lines end
    with \n (LF) or \r\n (CRLF). Without accepting \r and \n as valid
    post-colon characters, the dedup safety net never activates.
    
    Add \r and \n checks to is_top_level_key_line, and three tests covering
    LF, CRLF, and first-occurrence preservation.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * refactor(hermes): keep last occurrence when healing duplicate YAML keys
    
    Reworks the healing layers on top of the CRLF root-cause fix:
    
    - deduplicate_top_level_keys: keep the LAST occurrence of each duplicated
      key instead of the first. Duplicates come from section replacement
      degrading into appends (#3633), so the last block is the newest data --
      and Hermes itself reads the config with PyYAML, whose duplicate-key
      semantics are last-wins. Keeping the first occurrence would silently
      roll users back to stale config and diverge from what Hermes runs with.
      Healthy files take a fast path and are returned untouched.
    - Drop the unused dup_key variable (fails cargo clippy -- -D warnings,
      which CI enforces).
    - replace_yaml_section: clean residual duplicate sections from the
      remainder via remove_all_sections; values come from the keep-last
      healed read, so dropping all stale on-disk copies loses nothing.
    - Add regression tests for the actual root cause (find/replace on CRLF
      input must replace in place, not append), keep-last semantics,
      identity on healthy files, end-to-end heal-then-parse, and duplicate
      cleanup on write.
    
    Fixes #3633 #2973 #2529 #3310 #3762
    
    ---------
    
    Co-authored-by: que3sui <204201112+que3sui@users.noreply.github.com>
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
    Co-authored-by: Jason <farion1231@gmail.com>
  • 修复 Completions转Anthropic时不记录实际返回模型、Input token记录错误问题 (#2774)
    * fix(proxy): 修复completions转claude格式流式响应未记录实际命中模型
    
    * style: cargo fmt fix
    
    * fix(proxy): 修复completions转claude格式时input与cache_read重复计费
    
    * fix(proxy): 修复完全缓存命中时input_tokens计算错误
    
    * test: 更新input_tokens期望值匹配去重逻辑
  • fix(presets): add Kimi affiliate links (#3809)
    Problem: Kimi and Moonshot preset links were user-clickable without the cc-switch affiliate query.\n\nDecision: Update only UI-facing preset website/API-key links and leave API request endpoints untouched.\n\nChange: Add aff=cc-switch to Kimi/Moonshot websiteUrl values and Codex/OpenCode API-key links.
    
    Co-authored-by: xumingyuan <xumingyuan@msh.team>
  • refactor(presets): align CCSub to end of partner block across apps
    Move the CCSub preset to sit right after DouBaoSeed, at the end of the
    partner block and before the first non-partner provider, so its position
    is consistent across all six apps:
    
    - Codex / OpenCode: moved up from the 2nd slot (between Shengsuanyun and
      the next partner) to the block tail
    - OpenClaw / Hermes: moved up from the aggregator section to the block tail
    - Claude / Claude Desktop: already at the block tail
    
    Also add the missing CHANGELOG entry for the CCSub preset, and drop the
    provider preset order test that enforced a now-unneeded ordering invariant.
  • feat(presets): add CCSub provider across six apps
    Add CCSub, a multi-model aggregator partner, as a preset for Claude, Codex, OpenCode, OpenClaw, Claude Desktop, and Hermes. Each preset carries the referral signup link as apiKeyUrl.
    
    - Register the ccsub icon via iconUrls (1.1MB SVG URL import) + metadata
    - Add partnerPromotion copy in zh/en/ja
    - List CCSub in the sponsor section of all README locales
    - Use gpt-5.5 and gemini-3.1-pro as the OpenAI/Gemini model ids
  • chore(release): prepare v3.16.2
    Add the v3.16.2 CHANGELOG entry covering the 41 commits since v3.16.1,
    bump the version across package.json, tauri.conf.json, Cargo.toml, and
    Cargo.lock, and add trilingual (zh/en/ja) release notes.
  • fix(proxy): strip cache_control from OpenAI format conversion (#3841)
    * fix(proxy): strip cache_control from OpenAI format conversion (#3805)
    
    - Remove cache_control passthrough from system messages, text blocks,
      and tools to prevent 400 errors on strict OpenAI-compatible endpoints
    - Always simplify single text block content to plain string format
    - Fixes two format conversion bugs reported in issue #3805
    
    * fix(proxy): apply cargo fmt to fix CI formatting check
  • fix(providers): only block explicit official providers under proxy takeover
    The proxy-takeover block previously fell back to the isOfficial heuristic
    (empty base_url / missing key) when category was absent. That misjudged
    custom providers whose endpoint lives in meta or whose fields are simply
    unfilled: their switch button got disabled, making users think the config
    was broken. That extra UI block was also "virtual" — the executor in
    useProviderActions only ever honored category === "official", so the
    front end blocked more than the backend would enforce.
    
    Gate the block solely on explicit category === "official", matching the
    executor and unifying both verdicts on a single source of truth.
    
    Also rework the blocked-state UI:
    - drop the red "blocked" badge for a plain disabled Enable button
    - move title/cursor onto a wrapper span (disabled buttons set
      pointer-events:none, so an on-button title/cursor never fired)
    - replace the account-ban warning tooltip with a lighter hint
      (provider.blockedByProxyHint), four locales kept in sync
  • feat(proxy): map input_file and input_audio content parts to chat
    Convert Responses input_file (requiring file_id or file_data, never file_url which Chat file parts do not support) and input_audio parts into their Chat Completions equivalents, and handle top-level input_* items that previously fell through and were dropped, clearing stale pending reasoning for non-assistant messages.
  • fix(proxy): distinguish truncated chat streams from normal completion
    Replace the unconditional finalize at chat-to-responses stream end with a three-way guard: complete normally when finish_reason or [DONE] arrived, emit an incomplete response when substantive output exists without a finish_reason, and emit a failed (stream_truncated) event for empty truncation instead of masking it as completed. Also propagate late-arriving reasoning_content onto still-active tool-call items.
  • fix(proxy): cache reasoning across turns for custom_tool_call and tool_search_call
    Generalize the cross-turn reasoning cache in codex chat history from function_call only to the full tool-call triad (function_call, custom_tool_call, tool_search_call) and their *_output counterparts, so apply_patch and tool-search calls keep their reasoning_content when restored via previous_response_id.
  • chore(presets): update SSSAiCode domain and endpoint nodes
    Switch website/apiKey URLs to sssaicodeapi.com and replace base URL
    nodes with node-hk.sssaicodeapi.com (default), node-hk.sssaiapi.com,
    and node-cf.sssaicodeapi.com across all 7 app presets.
  • fix(proxy): resolve actual port for ephemeral (port 0) listen config
    When listen_port is 0 the OS assigns the port at bind time, so the
    configured value can no longer be trusted for building takeover URLs.
    
    - server: read listener.local_addr() after bind and propagate the
      actual port to the global proxy port, status, and ProxyServerInfo
    - services: start the proxy before takeover when port is 0 so live
      configs get the real port instead of :0, and persist the resolved
      port back to the DB for DB-only URL paths; stop the pre-started
      server on any takeover failure
    - claude_desktop: reject an unresolved :0 port instead of emitting a
      broken gateway URL
    - build_proxy_urls: prefer the running server's port and error out if
      the port is still 0
    
    Add tests for takeover with an ephemeral port and the claude_desktop
    :0 rejection; switch existing codex takeover tests to an ephemeral
    port for isolation.
  • feat(proxy): add GET /v1/models endpoint for Codex CLI reachability check (#3818)
    * feat(proxy): add GET /v1/models endpoint for Codex CLI reachability check
    
    Codex CLI probes GET /v1/models at startup. Without this endpoint the proxy
    returns 404, causing Codex to fail before any request reaches the upstream
    LLM.
    
    Return an OpenAI-compatible model list derived from the cc-switch–managed
    model catalog file.
    
    Fixes #3812
    
    * fix(proxy): return Codex catalog schema from /v1/models
    
    Codex deserializes the response as a catalog with a top-level `models`
    field, not the OpenAI `{"object":"list","data":[...]}` envelope.
    Return the catalog file content directly so the format matches what
    Codex expects.
    
    Co-authored-by: Codex review bot
    
    * fix(proxy): guard /v1/models against serving stale catalog
    
    Only return the model catalog when config.toml still references it via
    `model_catalog_json`.  After switching to a provider without a custom
    catalog, the old file lingers on disk — serving it unconditionally
    would advertise the previous provider's models to Codex.
    
    Co-authored-by: Codex review bot
    
    * fix(proxy): match relative model_catalog_json in stale-guard
    
    cc-switch writes `model_catalog_json = "cc-switch-model-catalog.json"`
    (relative) via set_codex_model_catalog_json_field.  Match on the
    filename constant rather than the absolute path so the guard works
    with both relative and absolute paths.
    
    Co-authored-by: Codex review bot
    
    * fix(proxy): parse model_catalog_json field instead of substring match
    
    Replace raw config_text.contains() with proper TOML field parsing so
    commented-out lines and stray mentions of the filename in other fields
    don't defeat the stale guard.  Also switch from contains() to exact
    filename match (Path::new(val).file_name() == Some(...)) to stay
    consistent with resolve_cc_switch_catalog_path in codex_config.rs.
    
    Add log::debug! when the guard blocks serving so the operator can
    distinguish "no models configured" from "guard blocked stale catalog".
    
    * refactor(proxy): reuse resolve_cc_switch_catalog_path in handle_models
    
    Replace the inline config.toml parsing and filename match in
    handle_models with the existing resolve_cc_switch_catalog_path helper
    (now pub(crate)). This removes the duplicated stale-guard logic, keeps
    a single source of truth for catalog-path ownership, and makes the
    handler honor absolute model_catalog_json paths the same way Codex
    live-setting import does.
    
    ---------
    
    Co-authored-by: Jason <farion1231@gmail.com>
  • [codex] Fix VS Code session previews (#3593)
    * Fix Codex VS Code session previews
    
    * fix(codex): use last IDE request heading for session previews
    
    A markdown heading inside the active selection / open file could precede the real injected request, so matching the first "## My request for Codex:" heading picked selection content instead of the user prompt. Scan for the last matching heading (the IDE injects the real request as the final section) on both the Rust title path and the frontend TOC preview path.
    
    Add regression tests for the selection-heading case, and pin the known best-effort limitation when the request body itself repeats the heading.
    
    ---------
    
    Co-authored-by: Jason <farion1231@gmail.com>
  • fix: normalize path separators in scan_dir_recursive for Windows (#3430)
    On Windows, Path::strip_prefix produces backslash-separated relative
    paths. The update-check matching logic uses rsplit('/') to extract the
    install name, so subdirectory skills (e.g. skills/my-skill) never
    matched and updates were silently skipped. Replace backslashes with
    forward slashes when building the directory string.
  • fix(usage): correct inflated input_tokens in Claude stream parsing
    Some Anthropic-compatible SSE providers (e.g. qwen, minimax) report the
    full context (fresh + cached) as input_tokens in message_start, double
    counting the cached portion that is also reported in
    cache_read_input_tokens. This inflated the cacheable-input denominator
    and pushed the displayed cache hit rate artificially low.
    
    When a message_delta carries a smaller positive input_tokens, prefer it
    over the message_start value and adopt the cache counts from the same
    usage block to avoid double counting; fall back to the start cache
    values when the delta omits them. Native Claude (no input in delta) and
    OpenRouter-converted (input only in delta) paths are unchanged.
    
    Refs #3580
  • fix(opencode): use OpenAI-compatible SDK for APINebula preset
    APINebula is an OpenAI-compatible relay (its base URL ends in /v1, matching
    its Codex/OpenClaw/Hermes presets), but the OpenCode preset loaded the
    @ai-sdk/openai package, which targets the OpenAI Responses API and fails
    against chat-completions-only upstreams. Switch the npm field to
    @ai-sdk/openai-compatible so requests use the OpenAI Chat Completions format.
  • feat(usage): add official subscription quota template with unified tier rendering
    Changes:
    - Add official_subscription template type for Claude/Codex/Gemini
    - Replace implicit 'category=official auto-query' with explicit opt-in template
    - Default disabled; users enable via usage script modal with configurable interval
    - Unify tier→label mapping across subscription and script paths via labeled_tier_parts()
    - Fix tray rendering: week aliases (seven_day/opus/sonnet) now use highest utilization
    - Add depth guard: official_subscription checks enabled flag in query_provider_usage_inner
    - Add cache invalidation symmetry: invalidate_subscription() for disabled providers
    - i18n: add templateOfficialSubscription + hint in zh/en/ja/zh-TW
    
    Backend (Rust):
    - provider.rs: add TEMPLATE_TYPE_OFFICIAL_SUBSCRIPTION branch, flatten SubscriptionQuota→UsageData
    - tray.rs: extract labeled_tier_parts() shared by both summary functions, use max_by for multi-alias groups
    - usage_cache.rs: add invalidate_subscription() method
    - Test coverage: add week-alias highest-utilization tests for both paths
    
    Frontend (TypeScript):
    - UsageScriptModal: add official_subscription to templates, auto-detect for official providers
    - ProviderCard: gate useUsageQuery with !isOfficialSubscriptionUsage, pass autoQueryInterval to footer
    - SubscriptionQuotaFooter: accept autoQueryInterval prop, default 0 (disabled)
    - constants.ts: add TEMPLATE_TYPES.OFFICIAL_SUBSCRIPTION
    
    Fixes tier rendering regression where:
    - Claude/Codex: seven_day was missed (only weekly_limit matched) → lost 7-day window in tray
    - Gemini: gemini_pro/flash/flash_lite fell through to fallback → leaked machine names
    - Multi-window (opus+sonnet): find() took first, not worst → underestimated utilization and emoji color
    
    All tests pass (cargo test + cargo clippy clean).
  • fix: polish usage statistics ui (#3426)
    * fix: improve usage statistics ui
    
    * chore: remove unused token suffix translation
    
    ---------
    
    Co-authored-by: Jason <farion1231@gmail.com>
  • fix: disable auto-capitalize on Input component for macOS (#3626)
    Add autoComplete, autoCorrect, autoCapitalize, and spellCheck attributes
    to prevent macOS from auto-capitalizing the first letter in input fields.
  • fix(coding-plan): route Zhipu quota query to the user's configured base URL (#3702)
    Fixes #3701.
    
    `query_zhipu` was hard-coded to `https://api.z.ai`, so a user who
    configured the mainland China preset (`Zhipu GLM` on
    `open.bigmodel.cn`) could not retrieve usage once the international
    endpoint became unreachable from their network (or vice versa).
    
    The two endpoints share the same quota path (`/api/monitor/usage/quota/limit`)
    and return JSON in the same shape, and — crucially — each user only
    ever uses one of them: the quota host is the same host they're already
    running coding on. So we can route by the configured `base_url` and
    skip the cross-host fallback entirely.
    
    What this PR changes
    --------------------
    
    A single helper that maps the user's `base_url` to the matching quota
    host, and `query_zhipu` rebuilt to take `base_url` and pick the right
    host:
    
        fn zhipu_quota_base(base_url: &str) -> &'static str {
            if base_url.contains("bigmodel.cn") {
                "https://open.bigmodel.cn"
            } else {
                "https://api.z.ai"
            }
        }
    
        async fn query_zhipu(base_url: &str, api_key: &str) -> SubscriptionQuota {
            let url = format!(
                "{}/api/monitor/usage/quota/limit",
                zhipu_quota_base(base_url),
            );
            // ... original 401/403 -> Expired / make_error / parse path, unchanged
        }
    
    The dispatcher already distinguishes `ZhipuCn` from `ZhipuEn` via
    `detect_provider()` and routes the call through
    `query_zhipu(base_url, api_key)` in the same match arm.
    
    Why no cross-host fallback
    --------------------------
    
    Farion's review pointed out that adding a fallback would be
    over-engineered and actively harmful:
    
    1. Reachability is determined by the preset the user chose. Their
       configured host is the host they are already using to run coding;
       if it were unreachable, the user could not have reached the
       "query usage" step at all.
    
    2. The fallback path required distinguishing "both 401/403" (genuine
       bad key) from "one 401/403 + one network error" (regional block),
       which silently misclassified the second case as a generic query
       failure and hid the upstream "Session expired" UX for invalid
       keys.
    
    3. It also cost the worst-case ~10s+10s≈20s serial timeout for users
       on a working primary.
    
    With the URL-based routing in place, 401/403 returns to the original
    `CredentialStatus::Expired` semantics — same UX as `query_kimi` and
    `query_minimax`.
    
    Files changed
    -------------
    
    - `src-tauri/src/services/coding_plan.rs` — 1 file, +35 / -20
    
    Testing
    -------
    
    - 3 new `zhipu_quota_base_*` routing tests
    - 15 existing `coding_plan` parser tests still pass
    - `cargo fmt --check` clean
    - `cargo clippy --lib --no-deps -- -D warnings` clean
    
    Co-authored-by: Yongmao Luo <yongmao.luo@columbia.edu>
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
  • fix(proxy): skip backup/restore when Live is already a proxy placeholder (#3689)
    When previous stop_with_restore() failed to restore the user's original
    Live (e.g. app crash mid-stop, settings.json unwritable, or any pre-existing
    state where Live carries the proxy placeholders), the next
    start_with_takeover would read the still-placeholder Live and overwrite the
    good backup row with the proxy config itself. After that, every subsequent
    stop would restore the proxy placeholder back to Live — making the proxy
    toggle a no-op and leaving the client pinned at http://127.0.0.1:15721.
    
    Fix: in both backup write paths (`backup_live_configs` and
    `backup_live_config_strict`) detect that Live is already a proxy
    placeholder and skip the save, preserving any existing good backup. In
    `restore_live_config_for_app_with_fallback_inner`, detect the same
    condition in the parsed backup and fall through to the existing
    SSOT (current provider DB) path that was added in c3d810a.
    
    Both sides share a new `live_has_proxy_placeholder_for_app` dispatch
    helper so the placeholder check stays in lockstep with the existing
    per-app detection functions.
    
    Co-authored-by: Yongmao Luo <yongmao.luo@columbia.edu>
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>