everything-claude-code

test(lib): make concurrent-write test actually concurrent + use regex matcher for assert.throws

Two round-1 review findings in `tests/lib/session-bridge.test.js`,
both about test correctness rather than the underlying fix:

1. **greptile P1 + coderabbitai Major + cubic P2 (all three): concurrent-write test ran sequentially.**

The test spawned two child processes with two consecutive
`spawnSync` calls. Because `spawnSync` blocks until the child
exits, the second writer started *after* the first finished —
the two writers never overlapped, so the rename race the fix
targets was never actually exercised. The test would have passed
with the old broken `${target}.tmp` suffix.

Fix: introduce a one-off "race runner" helper that runs inside
its own subprocess and uses async `spawn` to start both writers
simultaneously. The runner waits for both to exit (the event
loop is local to the runner subprocess, so this stays compatible
with the synchronous test harness used elsewhere in this file)
and reports both exit codes plus stderrs on stdout. The test
then calls the runner via `spawnSync` and parses the result.
Both writer children now overlap for the duration of their 200
`writeBridgeAtomic` calls each, which is enough wall time to
reliably trigger the rename race against the pre-fix code.

Verified: with the fixed `${target}.${pid}.${nonce}.tmp` suffix,
the test passes; with the old fixed `${target}.tmp` suffix
reintroduced, it fails as expected (one writer hits ENOENT on
roughly half its rename calls).

2. **greptile P2 + cubic P3: `assert.throws` used a string as the second argument.**

Node deprecated passing a string as the second argument to
`assert.throws` years ago: the string is silently treated as
the assertion failure message (what to print when the function
does *not* throw) rather than as an error matcher. The check
passed for any thrown error, not just the rename failure.

Fix: pass a regex matcher as the second arg and keep the
explanatory text as the third. The regex matches `EISDIR`,
`EPERM`, `ENOTDIR`, or `ENOENT` because `renameSync` of a
regular tmp file onto an existing directory raises different
codes on Linux / macOS / BSD — making the matcher portable
across CI runners.

Test count unchanged at 14; `npm test` green; `npm run lint` clean.

The two helper files (`tests/__tmp_bridge_writer.js`,
`tests/__tmp_bridge_race_runner.js`) are written and unlinked
inside the test's try/finally so they never persist beyond the
test run.

Jamkris · 2026-05-19 04:57:10 -04:00

d904edc615

test(lib): concurrent writeBridgeAtomic + tmp-cleanup regression

Two regression tests pin down the previous two commits' atomic-rename
fixes:

1. **concurrent writes don't throw ENOENT or corrupt the file** —
   spawns two child Node processes (`tests/__tmp_bridge_writer.js`
   created in-test, cleaned up in finally) that each call
   `writeBridgeAtomic(sid, …)` 200 times against the same session
   ID with independent payloads. Asserts both subprocesses exit 0
   (the previous implementation produced ENOENT on roughly 50% of
   rename calls, all swallowed by the in-test catch) and the final
   bridge file is parseable JSON belonging to one of the two writers
   (last-writer-wins is fine; the contract is *no corruption* and
   *no rename ENOENT*, not data preservation).

2. **tmp file cleanup on rename failure** — pre-creates a directory
   at the target bridge path so `renameSync(tmp, target)` fails,
   calls `writeBridgeAtomic`, asserts the call throws AND that no
   tmp file with the writer's `pid.<nonce>.tmp` prefix is left
   behind in `os.tmpdir()`. The previous code had no cleanup; the
   fix's `try/catch + unlinkSync` keeps tmpdir from accumulating
   orphan files across repeated rename failures.

The first test deliberately writes independent payloads from each
subprocess so this regression doesn't try to claim a property the
fix doesn't actually deliver (read-modify-write race in the caller
is a separate issue and out of scope per PR body).

Test count: 12 → 14 in `tests/lib/session-bridge.test.js`;
`npm test` green; `npm run lint` clean.

Jamkris · 2026-05-19 04:57:10 -04:00

5acb01a276

test(ci): regression coverage for newly-covered invisible code points

9 new test cases pin down the two previous commits' denylist
extensions. Each verifies both detection (validator exit non-zero +
the expected `dangerous-invisible U+<HEX>` line on stderr) and,
where applicable, `--write` sanitization.

Coverage:

Tag block (commit 1):
- U+E0041 TAG LATIN CAPITAL LETTER A — the range's printable ASCII
  shadow; this is the byte sequence demonstrated in published ASCII
  smuggling proofs of concept.
- U+E007F CANCEL TAG — the range end.

Other invisibles (commit 2):
- U+180E MONGOLIAN VOWEL SEPARATOR
- U+115F HANGUL CHOSEONG FILLER
- U+1160 HANGUL JUNGSEONG FILLER
- U+2061 FUNCTION APPLICATION (range start)
- U+2064 INVISIBLE PLUS (range end)
- U+3164 HANGUL FILLER

Detection table is data-driven (one loop, one assertion per row) so
adding the next invisible to the denylist also gets a paired
regression test by simply appending to NEWLY_COVERED_RANGES.

Plus a `--write` integration test:
- writes a markdown file containing both Tag block (5 chars) and
  U+180E, runs `--write`, asserts both removed and surrounding text
  preserved character-for-character ('# Title\n\nBenigntext.\n').
- re-runs the validator without `--write` and asserts exit 0,
  confirming the sanitizer's output is idempotent under the
  extended denylist.

Test count: 5 → 14 in this file; full `yarn test` green; `yarn lint`
clean.

Jamkris · 2026-05-18 21:20:36 -04:00

33ed494adf

docs: refresh rc1 publication evidence

Affaan Mustafa · 2026-05-18 16:12:37 -04:00

4470e2e670

docs: refresh release blockers evidence

Affaan Mustafa · 2026-05-18 15:23:48 -04:00

0f1775e30b

docs: add discussion response playbook

Affaan Mustafa · 2026-05-18 14:39:11 -04:00

12ac22e674

test: normalize release workflow line endings

Affaan Mustafa · 2026-05-18 13:53:26 -04:00

97567a91e7

security: scope release oidc publishing

Affaan Mustafa · 2026-05-18 13:41:10 -04:00

7911af4a39

fix: treat MCP HTTP 406 probes as reachable

Affaan Mustafa · 2026-05-18 12:48:52 -04:00

386326df8e

docs: refresh publication readiness gate

Affaan Mustafa · 2026-05-18 10:49:49 -04:00

b41e6fb3d0

test: enforce release publication checklist in readiness gates

Affaan Mustafa · 2026-05-18 09:10:51 -04:00

680aeff0fb

docs: add release plugin publication checklist

Affaan Mustafa · 2026-05-18 08:56:17 -04:00

6c0fbfb6c5

docs: finish owner queue cleanup

Affaan Mustafa · 2026-05-18 06:35:44 -04:00

cdc92de42a

docs: record owner-wide queue cleanup

Affaan Mustafa · 2026-05-18 06:16:45 -04:00

08807e7fd6

docs: refresh billing readback gate evidence

Affaan Mustafa · 2026-05-18 04:30:09 -04:00

ff3eaff137

test: stabilize repair lifecycle on Windows

Affaan Mustafa · 2026-05-18 03:48:51 -04:00

bf17737969

docs: mirror target billing readback gate

Affaan Mustafa · 2026-05-18 03:27:42 -04:00

fb4b0c8dce

fix: ignore defensive ioc deny rules

Affaan Mustafa · 2026-05-18 02:29:59 -04:00

04d4d81938

docs: refine billing readback dashboard blocker

Affaan Mustafa · 2026-05-18 02:03:37 -04:00

e53933de1b

Merge PR #1976 provider response guards

Affaan Mustafa · 2026-05-18 01:05:37 -04:00

80f6c27957

fix: harden openai-compatible provider responses

Affaan Mustafa · 2026-05-18 01:04:28 -04:00

eb0d893948

test: skip insaits monitor subprocesses without python

Affaan Mustafa · 2026-05-18 00:47:05 -04:00

044d1863d0

docs: mirror marketplace billing provenance gate

Affaan Mustafa · 2026-05-18 00:36:01 -04:00

c276639bc7

test: align dashboard fixture with May 18 evidence

Affaan Mustafa · 2026-05-17 22:29:06 -04:00

d14191bed8

test: point platform audit at May 18 evidence

Affaan Mustafa · 2026-05-17 22:22:02 -04:00

5475db4f97

fix(hooks): persist metrics warning dedup

Affaan Mustafa · 2026-05-17 21:41:24 -04:00

9b1d891870

fix(hooks): suppress repeated metrics warning breadcrumbs

Affaan Mustafa · 2026-05-17 21:41:24 -04:00

4cafdb8304

fix(hooks): log fail-open breadcrumb on parse/read errors in metrics bridge

coderabbitai flagged: the two `catch` blocks in `readSessionCost`
silently swallowed every failure mode. A malformed `costs.jsonl`
row, a permission error opening the file, or any other unexpected
I/O failure would silently return zero cost — masking real
problems and feeding stale or zero numbers into
`ecc-context-monitor.js` (which then injects them as
`additionalContext` into the live model turn).

Fix two things, both fail-open-preserving:

1. **Inner JSON.parse catch** — count malformed lines and write
   one aggregated breadcrumb per call:

     [ecc-metrics-bridge] skipped N malformed line(s) in <path>

   Aggregating (rather than per-line) keeps a log-flooded
   `costs.jsonl` diagnosable without overwhelming stderr.

2. **Outer fs.readFileSync catch** — write a breadcrumb on real
   errors, but stay silent on `ENOENT`. The "no costs.jsonl yet"
   case is genuinely normal (no Stop event has fired this session)
   and producing noise on every PreToolUse before the first Stop
   would be reviewer-visible spam. All other error codes
   (`EACCES`, `EISDIR`, `EMFILE`, …) get:

     [ecc-metrics-bridge] failing open after <name> reading <path>: <msg>

In both cases the function still returns the zero-cost fallback
so the bridge never breaks tool execution — only the
diagnosability changes.

Two new regression tests in
`tests/hooks/ecc-metrics-bridge.test.js`:

  ✓ readSessionCost writes a stderr breadcrumb when malformed
    lines are skipped — feeds 4 rows (2 valid, 2 malformed),
    asserts the last valid row still wins AND captured stderr
    contains "skipped 2 malformed line(s)".

  ✓ readSessionCost stays silent when costs.jsonl does not exist
    (ENOENT) — uses a fresh tmp HOME with no metrics dir, asserts
    zero return AND empty stderr.

Test count: 16 → 18; `npm test` green; `yarn lint` clean.

Jamkris · 2026-05-17 21:41:24 -04:00

086e44c964

fix(hooks): scan full costs.jsonl when locating session row

`readSessionCost` read only the trailing 8 KiB of
`~/.claude/metrics/costs.jsonl` to "avoid scanning entire file".
That ceiling is the opposite-sign sibling of the double-count bug
fixed in the previous commit: once a session's most recent
cumulative row gets pushed past the 8 KiB window by newer rows
from other sessions, the bridge silently reports `totalCost: 0`,
`totalIn: 0`, `totalOut: 0` for that session — same false signal
to `ecc-context-monitor.js`, same wrong number injected into the
live model turn as `additionalContext`.

`cost-tracker.js` has no rotation policy, so on any non-trivial
workstation costs.jsonl grows past 8 KiB within minutes of normal
use. For users who keep multiple concurrent sessions, this means
the second-and-later sessions silently report zero almost
immediately.

Reproduced before this commit:

  $ HOME=/tmp/eccc node -e '
      const fs = require("fs");
      const m = require("./scripts/hooks/ecc-metrics-bridge.js");
      // S1 row at file start, then 200 rows of OTHER-session noise (~16 KiB).
      // S1 is the row we want, but it sits past the 8 KiB tail.
      const s1 = `{"session_id":"S1","estimated_cost_usd":0.5,"input_tokens":500,"output_tokens":250}`;
      const other = `{"session_id":"OTHER","estimated_cost_usd":1,"input_tokens":100,"output_tokens":50}`;
      fs.mkdirSync("/tmp/eccc/.claude/metrics", { recursive: true });
      fs.writeFileSync("/tmp/eccc/.claude/metrics/costs.jsonl",
        [s1, ...Array(200).fill(other)].join("\\n") + "\\n");
      console.log(JSON.stringify(m.readSessionCost("S1")));'
  {"totalCost":0,"totalIn":0,"totalOut":0}

Expected: `{"totalCost":0.5, "totalIn":500, "totalOut":250}` (the
S1 row that exists in the file).
Actual: zero — the row is past the 8 KiB tail.

Fix: drop the `fs.openSync` + bounded `fs.readSync` + position
arithmetic in favour of `fs.readFileSync(costsPath, 'utf8')` and
iterate every line. Each row is ~150 bytes; even 100k rows is
~15 MB and a single sync read on PreToolUse is in the low ms.
If file rotation lands in `cost-tracker.js` later, this scan
becomes proportionally cheaper.

After this commit the reproduction above returns
`{"totalCost":0.5, "totalIn":500, "totalOut":250}`.

Regression test in `tests/hooks/ecc-metrics-bridge.test.js`:
`readSessionCost finds session row beyond the old 8 KiB tail
boundary`. The test asserts the costs.jsonl fixture is > 8 KiB
before reading so any reintroduction of a bounded tail would
re-fail the test (i.e. the assertion is the contract, not the
specific number 8192).

Together with the previous commit, both directions of the
metrics-bridge cost-reporting bug are closed.

Jamkris · 2026-05-17 21:41:24 -04:00

63c9788f50

fix(hooks): use last cumulative row for session cost in metrics bridge

`ecc-metrics-bridge.js#readSessionCost` summed the
`estimated_cost_usd`, `input_tokens`, and `output_tokens` of
every matching row in `~/.claude/metrics/costs.jsonl`. That breaks
the documented contract of `scripts/hooks/cost-tracker.js`, which
explicitly states (in its module docblock):

  Cumulative behavior: Stop fires per assistant response, not
  per session. Each row therefore represents the cumulative
  session total up to that point. To get per-session cost, take
  the last row per session_id.

Summing N cumulative rows over-counts by roughly (N+1)/2 ×. For a
session with 3 rows at 0.01, 0.02, 0.03 USD (true running total
0.03), the bridge today reports 0.06 USD. The over-counted value
feeds `ecc-context-monitor.js`, which then trips its
COST_NOTICE_USD / COST_WARNING_USD / COST_CRITICAL_USD thresholds
on phantom spend AND injects the inflated number as
`additionalContext` into the live model turn — so the agent
itself is told a wrong cost.

Reproduced on `main` before this commit:

  $ cat > /tmp/eccc/.claude/metrics/costs.jsonl <<EOF
  {"session_id":"S1","estimated_cost_usd":0.01,"input_tokens":333,"output_tokens":166}
  {"session_id":"S1","estimated_cost_usd":0.02,"input_tokens":666,"output_tokens":333}
  {"session_id":"S1","estimated_cost_usd":0.03,"input_tokens":1000,"output_tokens":500}
  EOF

  $ HOME=/tmp/eccc node -e 'const m = require("./scripts/hooks/ecc-metrics-bridge.js"); \
      console.log(JSON.stringify(m.readSessionCost("S1")))'
  {"totalCost":0.06,"totalIn":1999,"totalOut":999}

Expected: `{"totalCost":0.03,"totalIn":1000,"totalOut":500}` (the
last cumulative row).
Actual: 2× over-count.

Fix: replace `+=` with `=` in the matching branch so the assigned
values reflect the most recent row encountered. The iteration
order is file order, which is also event time order, so the last
assignment wins — exactly the contract cost-tracker writes
against.

After this commit the reproduction above returns
`{"totalCost":0.03,"totalIn":1000,"totalOut":500}`.

Regression test in `tests/hooks/ecc-metrics-bridge.test.js`:
`readSessionCost returns the LAST cumulative row, not the sum
(cost-tracker contract)`. The existing
`readSessionCost does not include unrelated default-session rows`
test happened to pass even with the bug because it only had one
target-session row — single-row sessions are coincidentally
correct under both formulas. The new test uses three rows so the
two formulas diverge.

A second issue in the same function — the 8 KiB tail-only read
silently drops older rows once a session's recent cumulative
totals scroll past that window — is fixed in the next commit.

Jamkris · 2026-05-17 21:41:24 -04:00

4f21ed2acf

test(ci): coverage for round-1 fixes (quoted write-all, dedup, lifecycle scope)

Three test changes in response to the round-1 review:

1. **Add quoted-write-all coverage** (cubic P0 follow-up).
   Two new cases assert the regex now matches the double-quoted and
   single-quoted YAML forms of `permissions: "write-all"`:
     - `rejects double-quoted permissions: "write-all"`
     - `rejects single-quoted permissions: 'write-all'`
   Both fixtures trigger only the persist-credentials gate, so they
   exercise the WRITE_ALL_PATTERN OR-clause in isolation.

2. **Add expression+ref dedup coverage** (greptile P2 follow-up).
   `emits a single violation when both expressionPattern and refPattern
   match the same step` — uses `refs/pull/${{ … head.sha }}/merge` as
   the fixture (which matches both patterns) and counts ERROR lines for
   the `pull_request_target` rule, asserting exactly one. Re-introducing
   the duplicate-push bug would re-fail this test immediately.

3. **Drop the `npm ci without --ignore-scripts under write-all` test**
   (greptile P2). That test happened to pass under the previous
   `--ignore-scripts` regex, but `UNSAFE_INSTALL_PATTERNS` (added in
   `f7035b56`) fires unconditionally for every workflow regardless of
   permissions. So the test was exercising a pre-existing code path
   that has nothing to do with WRITE_ALL_PATTERN. Reviewer flagged this
   could mislead future contributors into thinking lifecycle-script
   enforcement is gated on write permissions.

   Replaced by the surrounding `rejects checkout credential persistence
   in workflows with permissions: write-all` test (already present) and
   the new quoted-form tests above, which all exercise the actual
   persist-credentials gate that the WRITE_ALL_PATTERN clause newly
   activates.

Test count: 22 → 24 (added 3 new, dropped 1). All green; `yarn lint`
clean.

The cohort comment above the write-all block was also tightened to
explicitly note that "the lifecycle-script gate already fires
unconditionally for every workflow" so the next reader sees the
distinction up front.

Jamkris · 2026-05-17 21:19:29 -04:00

7bb3172041

fix(ci): flag refs/pull checkouts under pull_request_target

The `pull_request_target` rule's `expressionPattern` matches only
the canonical `github.event.pull_request.head.{ref,sha,repo.full_name}`
interpolations. It does not match the second canonical form of
the same exploit — fetching `refs/pull/<N>/{head,merge}` directly:

  - uses: actions/checkout@v4
    with:
      ref: refs/pull/${{ github.event.pull_request.number }}/merge

The merge-ref variant is what GitHub's own security guidance calls
out as the highest-severity privilege-escalation pattern under
`pull_request_target`: it materialises the PR's merge commit
(attacker code spliced with base), executes inside a workflow that
has full repo-scoped tokens, and gives the attacker the chance to
exfiltrate secrets or push to default branches. `refs/pull/N/head`
is functionally equivalent — same source, same trust boundary.

Reproduced on `main` before this commit:

  $ cat /tmp/bad.yml
  name: bad
  on: { pull_request_target: { types: [opened] } }
  permissions: { contents: read }
  jobs:
    do:
      runs-on: ubuntu-latest
      steps:
        - uses: actions/checkout@v4
          with:
            ref: refs/pull/${{ github.event.pull_request.number }}/merge
            persist-credentials: false
        - run: npm ci --ignore-scripts

  $ ECC_WORKFLOWS_DIR=/tmp node scripts/ci/validate-workflow-security.js
  Validated workflow security for 1 workflow files
  $ echo $?
  0

Expected: violation flagging the refs/pull checkout under pull_request_target.
Actual: passes silently.

Fix: add a `refPattern` to the `pull_request_target` rule:

    /^\s*ref:\s*['"]?[^'"\n]*refs\/(?:remotes\/)?pull\/[^'"\n\s]+/m

and apply it per checkout step inside the existing
event-gated loop. The pattern matches the ref VALUE so it catches
all interpolation shapes — `refs/pull/123/head`,
`refs/pull/${{ github.event.pull_request.number }}/merge`,
`${{ env.FOO }}/refs/pull/N/head` — without enumerating the
possible interpolations themselves.

Scoping: the rule is already gated on the workflow containing
`pull_request_target:`, so non-privileged `pull_request` workflows
that legitimately check out a PR ref are not affected.

After this commit the reproduction above exits 1 with:

  ERROR: bad.yml:10 - pull_request_target must not checkout an untrusted pull_request head ref/repository

Three new regression tests in `tests/ci/validate-workflow-security.test.js`:
  - rejects pull_request_target + refs/pull/<N>/merge
  - rejects pull_request_target + hardcoded refs/pull/<N>/head
  - allows pull_request_target with no `with.ref:` (base-ref checkout —
    the safe pattern from GitHub's own guidance)

Test count: 17 → 20 in this file; full `yarn test` still green.

Together with the previous commit, this closes the two
independent `validate-workflow-security.js` bypasses I found.

Jamkris · 2026-05-17 21:19:29 -04:00

cdbc925d89

fix(ci): treat 'permissions: write-all' as a write-permission gate

`WRITE_PERMISSION_PATTERN` in `validate-workflow-security.js`
enumerates named GitHub Actions scopes (`contents: write`,
`issues: write`, etc.) to decide whether a workflow needs to:
  - disable `persist-credentials` on `actions/checkout`
  - pass `--ignore-scripts` to `npm ci`

The pattern misses the top-level shorthand `permissions:
write-all`, which is the strictly broader form — it grants every
named scope write access in a single line. As a result, a
workflow that opts into write-all currently slips both gates.

Reproduced on `main` before this commit:

  $ cat /tmp/bad.yml
  name: bad
  on: [push]
  permissions: write-all
  jobs:
    do:
      runs-on: ubuntu-latest
      steps:
        - uses: actions/checkout@v4
        - run: npm ci

  $ ECC_WORKFLOWS_DIR=/tmp node scripts/ci/validate-workflow-security.js
  Validated workflow security for 1 workflow files
  $ echo $?
  0

Expected: at least two violations (missing `persist-credentials:
false`, missing `--ignore-scripts`).
Actual: passes silently.

Fix: add a sibling pattern `WRITE_ALL_PATTERN` that matches
`^\s*permissions:\s*write-all\b` and OR it with
`WRITE_PERMISSION_PATTERN` at the single gate. Both top-level
and job-level `permissions:` blocks satisfy the `^\s*` prefix.

After this commit the reproduction above exits 1 with:

  ERROR: bad.yml:8 - workflows with write permissions must disable checkout credential persistence
  ERROR: bad.yml:9 - workflows with write permissions must install npm dependencies with --ignore-scripts

Three new regression tests in `tests/ci/validate-workflow-security.test.js`:
  - rejects write-all + credential-persisting checkout
  - rejects write-all + `npm ci` without `--ignore-scripts`
  - allows write-all when both gates are satisfied (no over-block)

Test count: 14 → 17 in this file; full `yarn test` still green.

A separate `refs/pull/N/merge` bypass under `pull_request_target`
exists in the same validator and is fixed in the next commit.

Jamkris · 2026-05-17 21:19:29 -04:00

7f971b7e6f

fix(installer): harden locale docs install

Affaan Mustafa · 2026-05-17 20:46:04 -04:00

666b4e2261

fix: skip disabled discussion queries in platform audit

Affaan Mustafa · 2026-05-17 20:32:09 -04:00

519c592a12

security: cover gh-token-monitor token persistence

Affaan Mustafa · 2026-05-17 17:46:35 -04:00

36d390aa7d

Add preview pack smoke gate