Commit Graph

604 Commits

  • fix(hooks): log fail-open breadcrumb on parse/read errors in metrics bridge
    coderabbitai flagged: the two `catch` blocks in `readSessionCost`
    silently swallowed every failure mode. A malformed `costs.jsonl`
    row, a permission error opening the file, or any other unexpected
    I/O failure would silently return zero cost — masking real
    problems and feeding stale or zero numbers into
    `ecc-context-monitor.js` (which then injects them as
    `additionalContext` into the live model turn).
    
    Fix two things, both fail-open-preserving:
    
    1. **Inner JSON.parse catch** — count malformed lines and write
       one aggregated breadcrumb per call:
    
         [ecc-metrics-bridge] skipped N malformed line(s) in <path>
    
       Aggregating (rather than per-line) keeps a log-flooded
       `costs.jsonl` diagnosable without overwhelming stderr.
    
    2. **Outer fs.readFileSync catch** — write a breadcrumb on real
       errors, but stay silent on `ENOENT`. The "no costs.jsonl yet"
       case is genuinely normal (no Stop event has fired this session)
       and producing noise on every PreToolUse before the first Stop
       would be reviewer-visible spam. All other error codes
       (`EACCES`, `EISDIR`, `EMFILE`, …) get:
    
         [ecc-metrics-bridge] failing open after <name> reading <path>: <msg>
    
    In both cases the function still returns the zero-cost fallback
    so the bridge never breaks tool execution — only the
    diagnosability changes.
    
    Two new regression tests in
    `tests/hooks/ecc-metrics-bridge.test.js`:
    
      ✓ readSessionCost writes a stderr breadcrumb when malformed
        lines are skipped — feeds 4 rows (2 valid, 2 malformed),
        asserts the last valid row still wins AND captured stderr
        contains "skipped 2 malformed line(s)".
    
      ✓ readSessionCost stays silent when costs.jsonl does not exist
        (ENOENT) — uses a fresh tmp HOME with no metrics dir, asserts
        zero return AND empty stderr.
    
    Test count: 16 → 18; `npm test` green; `yarn lint` clean.
  • fix(hooks): scan full costs.jsonl when locating session row
    `readSessionCost` read only the trailing 8 KiB of
    `~/.claude/metrics/costs.jsonl` to "avoid scanning entire file".
    That ceiling is the opposite-sign sibling of the double-count bug
    fixed in the previous commit: once a session's most recent
    cumulative row gets pushed past the 8 KiB window by newer rows
    from other sessions, the bridge silently reports `totalCost: 0`,
    `totalIn: 0`, `totalOut: 0` for that session — same false signal
    to `ecc-context-monitor.js`, same wrong number injected into the
    live model turn as `additionalContext`.
    
    `cost-tracker.js` has no rotation policy, so on any non-trivial
    workstation costs.jsonl grows past 8 KiB within minutes of normal
    use. For users who keep multiple concurrent sessions, this means
    the second-and-later sessions silently report zero almost
    immediately.
    
    Reproduced before this commit:
    
      $ HOME=/tmp/eccc node -e '
          const fs = require("fs");
          const m = require("./scripts/hooks/ecc-metrics-bridge.js");
          // S1 row at file start, then 200 rows of OTHER-session noise (~16 KiB).
          // S1 is the row we want, but it sits past the 8 KiB tail.
          const s1 = `{"session_id":"S1","estimated_cost_usd":0.5,"input_tokens":500,"output_tokens":250}`;
          const other = `{"session_id":"OTHER","estimated_cost_usd":1,"input_tokens":100,"output_tokens":50}`;
          fs.mkdirSync("/tmp/eccc/.claude/metrics", { recursive: true });
          fs.writeFileSync("/tmp/eccc/.claude/metrics/costs.jsonl",
            [s1, ...Array(200).fill(other)].join("\\n") + "\\n");
          console.log(JSON.stringify(m.readSessionCost("S1")));'
      {"totalCost":0,"totalIn":0,"totalOut":0}
    
    Expected: `{"totalCost":0.5, "totalIn":500, "totalOut":250}` (the
    S1 row that exists in the file).
    Actual: zero — the row is past the 8 KiB tail.
    
    Fix: drop the `fs.openSync` + bounded `fs.readSync` + position
    arithmetic in favour of `fs.readFileSync(costsPath, 'utf8')` and
    iterate every line. Each row is ~150 bytes; even 100k rows is
    ~15 MB and a single sync read on PreToolUse is in the low ms.
    If file rotation lands in `cost-tracker.js` later, this scan
    becomes proportionally cheaper.
    
    After this commit the reproduction above returns
    `{"totalCost":0.5, "totalIn":500, "totalOut":250}`.
    
    Regression test in `tests/hooks/ecc-metrics-bridge.test.js`:
    `readSessionCost finds session row beyond the old 8 KiB tail
    boundary`. The test asserts the costs.jsonl fixture is > 8 KiB
    before reading so any reintroduction of a bounded tail would
    re-fail the test (i.e. the assertion is the contract, not the
    specific number 8192).
    
    Together with the previous commit, both directions of the
    metrics-bridge cost-reporting bug are closed.
  • fix(hooks): use last cumulative row for session cost in metrics bridge
    `ecc-metrics-bridge.js#readSessionCost` summed the
    `estimated_cost_usd`, `input_tokens`, and `output_tokens` of
    every matching row in `~/.claude/metrics/costs.jsonl`. That breaks
    the documented contract of `scripts/hooks/cost-tracker.js`, which
    explicitly states (in its module docblock):
    
      Cumulative behavior: Stop fires per assistant response, not
      per session. Each row therefore represents the cumulative
      session total up to that point. To get per-session cost, take
      the last row per session_id.
    
    Summing N cumulative rows over-counts by roughly (N+1)/2 ×. For a
    session with 3 rows at 0.01, 0.02, 0.03 USD (true running total
    0.03), the bridge today reports 0.06 USD. The over-counted value
    feeds `ecc-context-monitor.js`, which then trips its
    COST_NOTICE_USD / COST_WARNING_USD / COST_CRITICAL_USD thresholds
    on phantom spend AND injects the inflated number as
    `additionalContext` into the live model turn — so the agent
    itself is told a wrong cost.
    
    Reproduced on `main` before this commit:
    
      $ cat > /tmp/eccc/.claude/metrics/costs.jsonl <<EOF
      {"session_id":"S1","estimated_cost_usd":0.01,"input_tokens":333,"output_tokens":166}
      {"session_id":"S1","estimated_cost_usd":0.02,"input_tokens":666,"output_tokens":333}
      {"session_id":"S1","estimated_cost_usd":0.03,"input_tokens":1000,"output_tokens":500}
      EOF
    
      $ HOME=/tmp/eccc node -e 'const m = require("./scripts/hooks/ecc-metrics-bridge.js"); \
          console.log(JSON.stringify(m.readSessionCost("S1")))'
      {"totalCost":0.06,"totalIn":1999,"totalOut":999}
    
    Expected: `{"totalCost":0.03,"totalIn":1000,"totalOut":500}` (the
    last cumulative row).
    Actual: 2× over-count.
    
    Fix: replace `+=` with `=` in the matching branch so the assigned
    values reflect the most recent row encountered. The iteration
    order is file order, which is also event time order, so the last
    assignment wins — exactly the contract cost-tracker writes
    against.
    
    After this commit the reproduction above returns
    `{"totalCost":0.03,"totalIn":1000,"totalOut":500}`.
    
    Regression test in `tests/hooks/ecc-metrics-bridge.test.js`:
    `readSessionCost returns the LAST cumulative row, not the sum
    (cost-tracker contract)`. The existing
    `readSessionCost does not include unrelated default-session rows`
    test happened to pass even with the bug because it only had one
    target-session row — single-row sessions are coincidentally
    correct under both formulas. The new test uses three rows so the
    two formulas diverge.
    
    A second issue in the same function — the 8 KiB tail-only read
    silently drops older rows once a session's recent cumulative
    totals scroll past that window — is fixed in the next commit.
  • test(ci): coverage for round-1 fixes (quoted write-all, dedup, lifecycle scope)
    Three test changes in response to the round-1 review:
    
    1. **Add quoted-write-all coverage** (cubic P0 follow-up).
       Two new cases assert the regex now matches the double-quoted and
       single-quoted YAML forms of `permissions: "write-all"`:
         - `rejects double-quoted permissions: "write-all"`
         - `rejects single-quoted permissions: 'write-all'`
       Both fixtures trigger only the persist-credentials gate, so they
       exercise the WRITE_ALL_PATTERN OR-clause in isolation.
    
    2. **Add expression+ref dedup coverage** (greptile P2 follow-up).
       `emits a single violation when both expressionPattern and refPattern
       match the same step` — uses `refs/pull/${{ … head.sha }}/merge` as
       the fixture (which matches both patterns) and counts ERROR lines for
       the `pull_request_target` rule, asserting exactly one. Re-introducing
       the duplicate-push bug would re-fail this test immediately.
    
    3. **Drop the `npm ci without --ignore-scripts under write-all` test**
       (greptile P2). That test happened to pass under the previous
       `--ignore-scripts` regex, but `UNSAFE_INSTALL_PATTERNS` (added in
       `f7035b56`) fires unconditionally for every workflow regardless of
       permissions. So the test was exercising a pre-existing code path
       that has nothing to do with WRITE_ALL_PATTERN. Reviewer flagged this
       could mislead future contributors into thinking lifecycle-script
       enforcement is gated on write permissions.
    
       Replaced by the surrounding `rejects checkout credential persistence
       in workflows with permissions: write-all` test (already present) and
       the new quoted-form tests above, which all exercise the actual
       persist-credentials gate that the WRITE_ALL_PATTERN clause newly
       activates.
    
    Test count: 22 → 24 (added 3 new, dropped 1). All green; `yarn lint`
    clean.
    
    The cohort comment above the write-all block was also tightened to
    explicitly note that "the lifecycle-script gate already fires
    unconditionally for every workflow" so the next reader sees the
    distinction up front.
  • fix(ci): flag refs/pull checkouts under pull_request_target
    The `pull_request_target` rule's `expressionPattern` matches only
    the canonical `github.event.pull_request.head.{ref,sha,repo.full_name}`
    interpolations. It does not match the second canonical form of
    the same exploit — fetching `refs/pull/<N>/{head,merge}` directly:
    
      - uses: actions/checkout@v4
        with:
          ref: refs/pull/${{ github.event.pull_request.number }}/merge
    
    The merge-ref variant is what GitHub's own security guidance calls
    out as the highest-severity privilege-escalation pattern under
    `pull_request_target`: it materialises the PR's merge commit
    (attacker code spliced with base), executes inside a workflow that
    has full repo-scoped tokens, and gives the attacker the chance to
    exfiltrate secrets or push to default branches. `refs/pull/N/head`
    is functionally equivalent — same source, same trust boundary.
    
    Reproduced on `main` before this commit:
    
      $ cat /tmp/bad.yml
      name: bad
      on: { pull_request_target: { types: [opened] } }
      permissions: { contents: read }
      jobs:
        do:
          runs-on: ubuntu-latest
          steps:
            - uses: actions/checkout@v4
              with:
                ref: refs/pull/${{ github.event.pull_request.number }}/merge
                persist-credentials: false
            - run: npm ci --ignore-scripts
    
      $ ECC_WORKFLOWS_DIR=/tmp node scripts/ci/validate-workflow-security.js
      Validated workflow security for 1 workflow files
      $ echo $?
      0
    
    Expected: violation flagging the refs/pull checkout under pull_request_target.
    Actual: passes silently.
    
    Fix: add a `refPattern` to the `pull_request_target` rule:
    
        /^\s*ref:\s*['"]?[^'"\n]*refs\/(?:remotes\/)?pull\/[^'"\n\s]+/m
    
    and apply it per checkout step inside the existing
    event-gated loop. The pattern matches the ref VALUE so it catches
    all interpolation shapes — `refs/pull/123/head`,
    `refs/pull/${{ github.event.pull_request.number }}/merge`,
    `${{ env.FOO }}/refs/pull/N/head` — without enumerating the
    possible interpolations themselves.
    
    Scoping: the rule is already gated on the workflow containing
    `pull_request_target:`, so non-privileged `pull_request` workflows
    that legitimately check out a PR ref are not affected.
    
    After this commit the reproduction above exits 1 with:
    
      ERROR: bad.yml:10 - pull_request_target must not checkout an untrusted pull_request head ref/repository
    
    Three new regression tests in `tests/ci/validate-workflow-security.test.js`:
      - rejects pull_request_target + refs/pull/<N>/merge
      - rejects pull_request_target + hardcoded refs/pull/<N>/head
      - allows pull_request_target with no `with.ref:` (base-ref checkout —
        the safe pattern from GitHub's own guidance)
    
    Test count: 17 → 20 in this file; full `yarn test` still green.
    
    Together with the previous commit, this closes the two
    independent `validate-workflow-security.js` bypasses I found.
  • fix(ci): treat 'permissions: write-all' as a write-permission gate
    `WRITE_PERMISSION_PATTERN` in `validate-workflow-security.js`
    enumerates named GitHub Actions scopes (`contents: write`,
    `issues: write`, etc.) to decide whether a workflow needs to:
      - disable `persist-credentials` on `actions/checkout`
      - pass `--ignore-scripts` to `npm ci`
    
    The pattern misses the top-level shorthand `permissions:
    write-all`, which is the strictly broader form — it grants every
    named scope write access in a single line. As a result, a
    workflow that opts into write-all currently slips both gates.
    
    Reproduced on `main` before this commit:
    
      $ cat /tmp/bad.yml
      name: bad
      on: [push]
      permissions: write-all
      jobs:
        do:
          runs-on: ubuntu-latest
          steps:
            - uses: actions/checkout@v4
            - run: npm ci
    
      $ ECC_WORKFLOWS_DIR=/tmp node scripts/ci/validate-workflow-security.js
      Validated workflow security for 1 workflow files
      $ echo $?
      0
    
    Expected: at least two violations (missing `persist-credentials:
    false`, missing `--ignore-scripts`).
    Actual: passes silently.
    
    Fix: add a sibling pattern `WRITE_ALL_PATTERN` that matches
    `^\s*permissions:\s*write-all\b` and OR it with
    `WRITE_PERMISSION_PATTERN` at the single gate. Both top-level
    and job-level `permissions:` blocks satisfy the `^\s*` prefix.
    
    After this commit the reproduction above exits 1 with:
    
      ERROR: bad.yml:8 - workflows with write permissions must disable checkout credential persistence
      ERROR: bad.yml:9 - workflows with write permissions must install npm dependencies with --ignore-scripts
    
    Three new regression tests in `tests/ci/validate-workflow-security.test.js`:
      - rejects write-all + credential-persisting checkout
      - rejects write-all + `npm ci` without `--ignore-scripts`
      - allows write-all when both gates are satisfied (no over-block)
    
    Test count: 14 → 17 in this file; full `yarn test` still green.
    
    A separate `refs/pull/N/merge` bypass under `pull_request_target`
    exists in the same validator and is fixed in the next commit.
  • chore: update statusline ANSI color palette
    - Replace blinking red (5;31m) with bold red (1;31m) for critical context bar
    - Replace cyan metrics (36m) with sky blue (38;5;117m)
    - Replace plain bold task (1m) with bold bright white (1;97m)
    - Update test assertion to match new bold red code