625 Commits

  • test(lib): make concurrent-write test actually concurrent + use regex matcher for assert.throws
    Two round-1 review findings in `tests/lib/session-bridge.test.js`,
    both about test correctness rather than the underlying fix:
    
    1. **greptile P1 + coderabbitai Major + cubic P2 (all three): concurrent-write test ran sequentially.**
    
       The test spawned two child processes with two consecutive
       `spawnSync` calls. Because `spawnSync` blocks until the child
       exits, the second writer started *after* the first finished —
       the two writers never overlapped, so the rename race the fix
       targets was never actually exercised. The test would have passed
       with the old broken `${target}.tmp` suffix.
    
       Fix: introduce a one-off "race runner" helper that runs inside
       its own subprocess and uses async `spawn` to start both writers
       simultaneously. The runner waits for both to exit (the event
       loop is local to the runner subprocess, so this stays compatible
       with the synchronous test harness used elsewhere in this file)
       and reports both exit codes plus stderrs on stdout. The test
       then calls the runner via `spawnSync` and parses the result.
       Both writer children now overlap for the duration of their 200
       `writeBridgeAtomic` calls each, which is enough wall time to
       reliably trigger the rename race against the pre-fix code.
    
       Verified: with the fixed `${target}.${pid}.${nonce}.tmp` suffix,
       the test passes; with the old fixed `${target}.tmp` suffix
       reintroduced, it fails as expected (one writer hits ENOENT on
       roughly half its rename calls).
    
    2. **greptile P2 + cubic P3: `assert.throws` used a string as the second argument.**
    
       Node deprecated passing a string as the second argument to
       `assert.throws` years ago: the string is silently treated as
       the assertion failure message (what to print when the function
       does *not* throw) rather than as an error matcher. The check
       passed for any thrown error, not just the rename failure.
    
       Fix: pass a regex matcher as the second arg and keep the
       explanatory text as the third. The regex matches `EISDIR`,
       `EPERM`, `ENOTDIR`, or `ENOENT` because `renameSync` of a
       regular tmp file onto an existing directory raises different
       codes on Linux / macOS / BSD — making the matcher portable
       across CI runners.
    
    Test count unchanged at 14; `npm test` green; `npm run lint` clean.
    
    The two helper files (`tests/__tmp_bridge_writer.js`,
    `tests/__tmp_bridge_race_runner.js`) are written and unlinked
    inside the test's try/finally so they never persist beyond the
    test run.
  • test(lib): concurrent writeBridgeAtomic + tmp-cleanup regression
    Two regression tests pin down the previous two commits' atomic-rename
    fixes:
    
    1. **concurrent writes don't throw ENOENT or corrupt the file** —
       spawns two child Node processes (`tests/__tmp_bridge_writer.js`
       created in-test, cleaned up in finally) that each call
       `writeBridgeAtomic(sid, …)` 200 times against the same session
       ID with independent payloads. Asserts both subprocesses exit 0
       (the previous implementation produced ENOENT on roughly 50% of
       rename calls, all swallowed by the in-test catch) and the final
       bridge file is parseable JSON belonging to one of the two writers
       (last-writer-wins is fine; the contract is *no corruption* and
       *no rename ENOENT*, not data preservation).
    
    2. **tmp file cleanup on rename failure** — pre-creates a directory
       at the target bridge path so `renameSync(tmp, target)` fails,
       calls `writeBridgeAtomic`, asserts the call throws AND that no
       tmp file with the writer's `pid.<nonce>.tmp` prefix is left
       behind in `os.tmpdir()`. The previous code had no cleanup; the
       fix's `try/catch + unlinkSync` keeps tmpdir from accumulating
       orphan files across repeated rename failures.
    
    The first test deliberately writes independent payloads from each
    subprocess so this regression doesn't try to claim a property the
    fix doesn't actually deliver (read-modify-write race in the caller
    is a separate issue and out of scope per PR body).
    
    Test count: 12 → 14 in `tests/lib/session-bridge.test.js`;
    `npm test` green; `npm run lint` clean.
  • test(ci): regression coverage for newly-covered invisible code points
    9 new test cases pin down the two previous commits' denylist
    extensions. Each verifies both detection (validator exit non-zero +
    the expected `dangerous-invisible U+<HEX>` line on stderr) and,
    where applicable, `--write` sanitization.
    
    Coverage:
    
    Tag block (commit 1):
    - U+E0041 TAG LATIN CAPITAL LETTER A — the range's printable ASCII
      shadow; this is the byte sequence demonstrated in published ASCII
      smuggling proofs of concept.
    - U+E007F CANCEL TAG — the range end.
    
    Other invisibles (commit 2):
    - U+180E MONGOLIAN VOWEL SEPARATOR
    - U+115F HANGUL CHOSEONG FILLER
    - U+1160 HANGUL JUNGSEONG FILLER
    - U+2061 FUNCTION APPLICATION (range start)
    - U+2064 INVISIBLE PLUS (range end)
    - U+3164 HANGUL FILLER
    
    Detection table is data-driven (one loop, one assertion per row) so
    adding the next invisible to the denylist also gets a paired
    regression test by simply appending to NEWLY_COVERED_RANGES.
    
    Plus a `--write` integration test:
    - writes a markdown file containing both Tag block (5 chars) and
      U+180E, runs `--write`, asserts both removed and surrounding text
      preserved character-for-character ('# Title\n\nBenigntext.\n').
    - re-runs the validator without `--write` and asserts exit 0,
      confirming the sanitizer's output is idempotent under the
      extended denylist.
    
    Test count: 5 → 14 in this file; full `yarn test` green; `yarn lint`
    clean.
  • fix(hooks): log fail-open breadcrumb on parse/read errors in metrics bridge
    coderabbitai flagged: the two `catch` blocks in `readSessionCost`
    silently swallowed every failure mode. A malformed `costs.jsonl`
    row, a permission error opening the file, or any other unexpected
    I/O failure would silently return zero cost — masking real
    problems and feeding stale or zero numbers into
    `ecc-context-monitor.js` (which then injects them as
    `additionalContext` into the live model turn).
    
    Fix two things, both fail-open-preserving:
    
    1. **Inner JSON.parse catch** — count malformed lines and write
       one aggregated breadcrumb per call:
    
         [ecc-metrics-bridge] skipped N malformed line(s) in <path>
    
       Aggregating (rather than per-line) keeps a log-flooded
       `costs.jsonl` diagnosable without overwhelming stderr.
    
    2. **Outer fs.readFileSync catch** — write a breadcrumb on real
       errors, but stay silent on `ENOENT`. The "no costs.jsonl yet"
       case is genuinely normal (no Stop event has fired this session)
       and producing noise on every PreToolUse before the first Stop
       would be reviewer-visible spam. All other error codes
       (`EACCES`, `EISDIR`, `EMFILE`, …) get:
    
         [ecc-metrics-bridge] failing open after <name> reading <path>: <msg>
    
    In both cases the function still returns the zero-cost fallback
    so the bridge never breaks tool execution — only the
    diagnosability changes.
    
    Two new regression tests in
    `tests/hooks/ecc-metrics-bridge.test.js`:
    
      ✓ readSessionCost writes a stderr breadcrumb when malformed
        lines are skipped — feeds 4 rows (2 valid, 2 malformed),
        asserts the last valid row still wins AND captured stderr
        contains "skipped 2 malformed line(s)".
    
      ✓ readSessionCost stays silent when costs.jsonl does not exist
        (ENOENT) — uses a fresh tmp HOME with no metrics dir, asserts
        zero return AND empty stderr.
    
    Test count: 16 → 18; `npm test` green; `yarn lint` clean.
  • fix(hooks): scan full costs.jsonl when locating session row
    `readSessionCost` read only the trailing 8 KiB of
    `~/.claude/metrics/costs.jsonl` to "avoid scanning entire file".
    That ceiling is the opposite-sign sibling of the double-count bug
    fixed in the previous commit: once a session's most recent
    cumulative row gets pushed past the 8 KiB window by newer rows
    from other sessions, the bridge silently reports `totalCost: 0`,
    `totalIn: 0`, `totalOut: 0` for that session — same false signal
    to `ecc-context-monitor.js`, same wrong number injected into the
    live model turn as `additionalContext`.
    
    `cost-tracker.js` has no rotation policy, so on any non-trivial
    workstation costs.jsonl grows past 8 KiB within minutes of normal
    use. For users who keep multiple concurrent sessions, this means
    the second-and-later sessions silently report zero almost
    immediately.
    
    Reproduced before this commit:
    
      $ HOME=/tmp/eccc node -e '
          const fs = require("fs");
          const m = require("./scripts/hooks/ecc-metrics-bridge.js");
          // S1 row at file start, then 200 rows of OTHER-session noise (~16 KiB).
          // S1 is the row we want, but it sits past the 8 KiB tail.
          const s1 = `{"session_id":"S1","estimated_cost_usd":0.5,"input_tokens":500,"output_tokens":250}`;
          const other = `{"session_id":"OTHER","estimated_cost_usd":1,"input_tokens":100,"output_tokens":50}`;
          fs.mkdirSync("/tmp/eccc/.claude/metrics", { recursive: true });
          fs.writeFileSync("/tmp/eccc/.claude/metrics/costs.jsonl",
            [s1, ...Array(200).fill(other)].join("\\n") + "\\n");
          console.log(JSON.stringify(m.readSessionCost("S1")));'
      {"totalCost":0,"totalIn":0,"totalOut":0}
    
    Expected: `{"totalCost":0.5, "totalIn":500, "totalOut":250}` (the
    S1 row that exists in the file).
    Actual: zero — the row is past the 8 KiB tail.
    
    Fix: drop the `fs.openSync` + bounded `fs.readSync` + position
    arithmetic in favour of `fs.readFileSync(costsPath, 'utf8')` and
    iterate every line. Each row is ~150 bytes; even 100k rows is
    ~15 MB and a single sync read on PreToolUse is in the low ms.
    If file rotation lands in `cost-tracker.js` later, this scan
    becomes proportionally cheaper.
    
    After this commit the reproduction above returns
    `{"totalCost":0.5, "totalIn":500, "totalOut":250}`.
    
    Regression test in `tests/hooks/ecc-metrics-bridge.test.js`:
    `readSessionCost finds session row beyond the old 8 KiB tail
    boundary`. The test asserts the costs.jsonl fixture is > 8 KiB
    before reading so any reintroduction of a bounded tail would
    re-fail the test (i.e. the assertion is the contract, not the
    specific number 8192).
    
    Together with the previous commit, both directions of the
    metrics-bridge cost-reporting bug are closed.
  • fix(hooks): use last cumulative row for session cost in metrics bridge
    `ecc-metrics-bridge.js#readSessionCost` summed the
    `estimated_cost_usd`, `input_tokens`, and `output_tokens` of
    every matching row in `~/.claude/metrics/costs.jsonl`. That breaks
    the documented contract of `scripts/hooks/cost-tracker.js`, which
    explicitly states (in its module docblock):
    
      Cumulative behavior: Stop fires per assistant response, not
      per session. Each row therefore represents the cumulative
      session total up to that point. To get per-session cost, take
      the last row per session_id.
    
    Summing N cumulative rows over-counts by roughly (N+1)/2 ×. For a
    session with 3 rows at 0.01, 0.02, 0.03 USD (true running total
    0.03), the bridge today reports 0.06 USD. The over-counted value
    feeds `ecc-context-monitor.js`, which then trips its
    COST_NOTICE_USD / COST_WARNING_USD / COST_CRITICAL_USD thresholds
    on phantom spend AND injects the inflated number as
    `additionalContext` into the live model turn — so the agent
    itself is told a wrong cost.
    
    Reproduced on `main` before this commit:
    
      $ cat > /tmp/eccc/.claude/metrics/costs.jsonl <<EOF
      {"session_id":"S1","estimated_cost_usd":0.01,"input_tokens":333,"output_tokens":166}
      {"session_id":"S1","estimated_cost_usd":0.02,"input_tokens":666,"output_tokens":333}
      {"session_id":"S1","estimated_cost_usd":0.03,"input_tokens":1000,"output_tokens":500}
      EOF
    
      $ HOME=/tmp/eccc node -e 'const m = require("./scripts/hooks/ecc-metrics-bridge.js"); \
          console.log(JSON.stringify(m.readSessionCost("S1")))'
      {"totalCost":0.06,"totalIn":1999,"totalOut":999}
    
    Expected: `{"totalCost":0.03,"totalIn":1000,"totalOut":500}` (the
    last cumulative row).
    Actual: 2× over-count.
    
    Fix: replace `+=` with `=` in the matching branch so the assigned
    values reflect the most recent row encountered. The iteration
    order is file order, which is also event time order, so the last
    assignment wins — exactly the contract cost-tracker writes
    against.
    
    After this commit the reproduction above returns
    `{"totalCost":0.03,"totalIn":1000,"totalOut":500}`.
    
    Regression test in `tests/hooks/ecc-metrics-bridge.test.js`:
    `readSessionCost returns the LAST cumulative row, not the sum
    (cost-tracker contract)`. The existing
    `readSessionCost does not include unrelated default-session rows`
    test happened to pass even with the bug because it only had one
    target-session row — single-row sessions are coincidentally
    correct under both formulas. The new test uses three rows so the
    two formulas diverge.
    
    A second issue in the same function — the 8 KiB tail-only read
    silently drops older rows once a session's recent cumulative
    totals scroll past that window — is fixed in the next commit.
  • test(ci): coverage for round-1 fixes (quoted write-all, dedup, lifecycle scope)
    Three test changes in response to the round-1 review:
    
    1. **Add quoted-write-all coverage** (cubic P0 follow-up).
       Two new cases assert the regex now matches the double-quoted and
       single-quoted YAML forms of `permissions: "write-all"`:
         - `rejects double-quoted permissions: "write-all"`
         - `rejects single-quoted permissions: 'write-all'`
       Both fixtures trigger only the persist-credentials gate, so they
       exercise the WRITE_ALL_PATTERN OR-clause in isolation.
    
    2. **Add expression+ref dedup coverage** (greptile P2 follow-up).
       `emits a single violation when both expressionPattern and refPattern
       match the same step` — uses `refs/pull/${{ … head.sha }}/merge` as
       the fixture (which matches both patterns) and counts ERROR lines for
       the `pull_request_target` rule, asserting exactly one. Re-introducing
       the duplicate-push bug would re-fail this test immediately.
    
    3. **Drop the `npm ci without --ignore-scripts under write-all` test**
       (greptile P2). That test happened to pass under the previous
       `--ignore-scripts` regex, but `UNSAFE_INSTALL_PATTERNS` (added in
       `f7035b56`) fires unconditionally for every workflow regardless of
       permissions. So the test was exercising a pre-existing code path
       that has nothing to do with WRITE_ALL_PATTERN. Reviewer flagged this
       could mislead future contributors into thinking lifecycle-script
       enforcement is gated on write permissions.
    
       Replaced by the surrounding `rejects checkout credential persistence
       in workflows with permissions: write-all` test (already present) and
       the new quoted-form tests above, which all exercise the actual
       persist-credentials gate that the WRITE_ALL_PATTERN clause newly
       activates.
    
    Test count: 22 → 24 (added 3 new, dropped 1). All green; `yarn lint`
    clean.
    
    The cohort comment above the write-all block was also tightened to
    explicitly note that "the lifecycle-script gate already fires
    unconditionally for every workflow" so the next reader sees the
    distinction up front.
  • fix(ci): flag refs/pull checkouts under pull_request_target
    The `pull_request_target` rule's `expressionPattern` matches only
    the canonical `github.event.pull_request.head.{ref,sha,repo.full_name}`
    interpolations. It does not match the second canonical form of
    the same exploit — fetching `refs/pull/<N>/{head,merge}` directly:
    
      - uses: actions/checkout@v4
        with:
          ref: refs/pull/${{ github.event.pull_request.number }}/merge
    
    The merge-ref variant is what GitHub's own security guidance calls
    out as the highest-severity privilege-escalation pattern under
    `pull_request_target`: it materialises the PR's merge commit
    (attacker code spliced with base), executes inside a workflow that
    has full repo-scoped tokens, and gives the attacker the chance to
    exfiltrate secrets or push to default branches. `refs/pull/N/head`
    is functionally equivalent — same source, same trust boundary.
    
    Reproduced on `main` before this commit:
    
      $ cat /tmp/bad.yml
      name: bad
      on: { pull_request_target: { types: [opened] } }
      permissions: { contents: read }
      jobs:
        do:
          runs-on: ubuntu-latest
          steps:
            - uses: actions/checkout@v4
              with:
                ref: refs/pull/${{ github.event.pull_request.number }}/merge
                persist-credentials: false
            - run: npm ci --ignore-scripts
    
      $ ECC_WORKFLOWS_DIR=/tmp node scripts/ci/validate-workflow-security.js
      Validated workflow security for 1 workflow files
      $ echo $?
      0
    
    Expected: violation flagging the refs/pull checkout under pull_request_target.
    Actual: passes silently.
    
    Fix: add a `refPattern` to the `pull_request_target` rule:
    
        /^\s*ref:\s*['"]?[^'"\n]*refs\/(?:remotes\/)?pull\/[^'"\n\s]+/m
    
    and apply it per checkout step inside the existing
    event-gated loop. The pattern matches the ref VALUE so it catches
    all interpolation shapes — `refs/pull/123/head`,
    `refs/pull/${{ github.event.pull_request.number }}/merge`,
    `${{ env.FOO }}/refs/pull/N/head` — without enumerating the
    possible interpolations themselves.
    
    Scoping: the rule is already gated on the workflow containing
    `pull_request_target:`, so non-privileged `pull_request` workflows
    that legitimately check out a PR ref are not affected.
    
    After this commit the reproduction above exits 1 with:
    
      ERROR: bad.yml:10 - pull_request_target must not checkout an untrusted pull_request head ref/repository
    
    Three new regression tests in `tests/ci/validate-workflow-security.test.js`:
      - rejects pull_request_target + refs/pull/<N>/merge
      - rejects pull_request_target + hardcoded refs/pull/<N>/head
      - allows pull_request_target with no `with.ref:` (base-ref checkout —
        the safe pattern from GitHub's own guidance)
    
    Test count: 17 → 20 in this file; full `yarn test` still green.
    
    Together with the previous commit, this closes the two
    independent `validate-workflow-security.js` bypasses I found.
  • fix(ci): treat 'permissions: write-all' as a write-permission gate
    `WRITE_PERMISSION_PATTERN` in `validate-workflow-security.js`
    enumerates named GitHub Actions scopes (`contents: write`,
    `issues: write`, etc.) to decide whether a workflow needs to:
      - disable `persist-credentials` on `actions/checkout`
      - pass `--ignore-scripts` to `npm ci`
    
    The pattern misses the top-level shorthand `permissions:
    write-all`, which is the strictly broader form — it grants every
    named scope write access in a single line. As a result, a
    workflow that opts into write-all currently slips both gates.
    
    Reproduced on `main` before this commit:
    
      $ cat /tmp/bad.yml
      name: bad
      on: [push]
      permissions: write-all
      jobs:
        do:
          runs-on: ubuntu-latest
          steps:
            - uses: actions/checkout@v4
            - run: npm ci
    
      $ ECC_WORKFLOWS_DIR=/tmp node scripts/ci/validate-workflow-security.js
      Validated workflow security for 1 workflow files
      $ echo $?
      0
    
    Expected: at least two violations (missing `persist-credentials:
    false`, missing `--ignore-scripts`).
    Actual: passes silently.
    
    Fix: add a sibling pattern `WRITE_ALL_PATTERN` that matches
    `^\s*permissions:\s*write-all\b` and OR it with
    `WRITE_PERMISSION_PATTERN` at the single gate. Both top-level
    and job-level `permissions:` blocks satisfy the `^\s*` prefix.
    
    After this commit the reproduction above exits 1 with:
    
      ERROR: bad.yml:8 - workflows with write permissions must disable checkout credential persistence
      ERROR: bad.yml:9 - workflows with write permissions must install npm dependencies with --ignore-scripts
    
    Three new regression tests in `tests/ci/validate-workflow-security.test.js`:
      - rejects write-all + credential-persisting checkout
      - rejects write-all + `npm ci` without `--ignore-scripts`
      - allows write-all when both gates are satisfied (no over-block)
    
    Test count: 14 → 17 in this file; full `yarn test` still green.
    
    A separate `refs/pull/N/merge` bypass under `pull_request_target`
    exists in the same validator and is fixed in the next commit.