Skip to content

ci: audit pnpm test with bokuweb/coronarium#650

Merged
bokuweb merged 12 commits into
mainfrom
claude/wizardly-hopper-aa18eb
May 10, 2026
Merged

ci: audit pnpm test with bokuweb/coronarium#650
bokuweb merged 12 commits into
mainfrom
claude/wizardly-hopper-aa18eb

Conversation

@bokuweb

@bokuweb bokuweb commented May 10, 2026

Copy link
Copy Markdown
Member

Summary

  • Adds .github/coronarium.yml — a default-allow audit policy (no blocking).
  • In ci.yml, wraps the pnpm test step on Linux and Windows with bokuweb/coronarium@v0 so every exec / open / connect syscall made by the CLI + library integration tests gets recorded (eBPF on Linux, ETW on Windows). macOS stays unsupervised — coronarium's supervised mode is Linux/Windows only.
  • Uploads coronarium-report.html + coronarium.log.json as a per-OS artifact, and on PRs upserts a single comment with the Linux report via bokuweb/coronarium/comment@v0.
  • Adds pull-requests: write so the comment action can post.

mode: audit means nothing fails because of coronarium — this just gives us per-PR visibility into what the test step actually does. Once the audit signal is trusted we can flip individual deny rules on (or move to mode: block) in a follow-up.

Test plan

  • CI matrix passes on ubuntu-latest, macos-latest, windows-latest
  • Job summary shows the coronarium audit table on Linux + Windows
  • coronarium-report-ubuntu-latest / coronarium-report-windows-latest artifacts are uploaded
  • On the PR a coronarium-report comment appears with the Linux verdict counts
  • No regressions in pnpm test runtime/output

🤖 Generated with Claude Code

bokuweb and others added 2 commits May 10, 2026 14:26
Wraps the `pnpm test` step in `bokuweb/coronarium@v0` (mode: audit)
on Linux + Windows so every exec/open/connect made during the CLI
+ library integration tests is recorded. Nothing is blocked — the
.github/coronarium.yml policy is default-allow on both network
and file. macOS is left unsupervised because coronarium's
supervised run is Linux/Windows only.

Reports are uploaded as `coronarium-report-<os>` artifacts and,
on pull requests, the Linux run is mirrored into a PR comment via
bokuweb/coronarium/comment@v0.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
sudo always replaces PATH with secure_path, so `sudo -E
$CORONARIUM_BIN run -- pnpm test` failed with `spawning pnpm: No
such file or directory`. Wrap with `env "PATH=$PATH"` so the
runner user's PATH reaches the supervised child.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
@github-actions

github-actions Bot commented May 10, 2026

Copy link
Copy Markdown

coronarium report

metric count
observed 93790
denied 0
lost 0

events by kind

kind count
exec 64
open 64
connect 64
📊 Open the full HTML report locally
rm -rf /tmp/coronarium-25623378865 && gh run download 25623378865 -R reg-viz/reg-cli -n coronarium-report-ubuntu-latest -D /tmp/coronarium-25623378865 && (open /tmp/coronarium-25623378865/coronarium-report.html 2>/dev/null || xdg-open /tmp/coronarium-25623378865/coronarium-report.html 2>/dev/null || echo "open file:///tmp/coronarium-25623378865/coronarium-report.html")

Requires the gh CLI. The command downloads the workflow artifact and opens the self-contained HTML report in your browser.

`pnpm` on Windows is a `.cmd` shim, but Rust's
`Command::new("pnpm")` does not honour PATHEXT and only finds
`pnpm.exe`, so coronarium-win failed to spawn the supervised
child with `program not found`. Wrap with `cmd /c` so the shim
gets resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
bokuweb and others added 3 commits May 10, 2026 14:57
Audit run (5518 events) showed nothing suspicious — only node /
pnpm / corepack stack opens (libc, hostedtoolcache, .cache/node,
/proc, /sys, /tmp, /var/lib/waagent for the Azure VM Agent) plus
PATH-search probes for `sh`. Zero connect events: pnpm install
runs in an earlier unsupervised step, and the wasm CLI tests
operate purely on local fixture images.

Tighten the policy accordingly:

- network: default deny (no allow list). 0 connects observed, so
  any new outbound during tests is a genuine signal worth failing
  CI on.
- file: default allow + deny-list of sensitive paths
  (shadow/sudoers/ssh-host-keys/.ssh/.aws/.gnupg/.netrc/
  .docker/docker.sock + Windows equivalents). A whitelist on disk
  would be both noisy and brittle for a node + pnpm runtime.
- process.deny_exec: basename block of typical exfil / lateral-
  movement tools (curl, wget, nc, ssh, scp, rsync, …). Tests only
  need node + pnpm + sh; anything on this list appearing means
  something new is running inside the supervised step.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
Restructures the matrix job so every `run:` step (corepack enable,
build-ui.sh, cargo test, pnpm install, pnpm build, pnpm test) runs
inside a single coronarium-supervised invocation on Linux + Windows.
macOS keeps the unsupervised path.

The policy is now true default-deny on all three pillars:

- network: explicit allow for 127.0.0.53:53 (systemd-resolved stub),
  github.com / codeload.github.com / objects.githubusercontent.com
  (build-ui.sh git clone), registry.npmjs.org (pnpm install), and
  index.crates.io / static.crates.io / crates.io (cargo deps).
  Anything else fails the connect at the cgroup eBPF layer.

- file: default deny + a broad FHS allow list (/bin /sbin /lib
  /lib64 /usr /etc /proc /sys /dev /tmp /run /var /opt /snap /home
  + C:\ on Windows), with a sensitive-path deny overlay (shadow /
  sudoers / ssh host keys / .ssh / .aws / .gnupg / .netrc /
  .docker / docker.sock / /var/run/secrets, plus Windows
  equivalents). Deny wins over allow, so the credential paths stay
  locked even though /etc and /home are on the allow list.

- process.deny_exec: basename-match block on curl / wget / nc /
  ncat / socat / ssh / scp / sftp / rsync / telnet / ftp. Tests
  legitimately need node, pnpm, sh, bash, git, cargo, vite — none
  of these tools.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
The first block-mode run failed because cargo's git fetch of
`image-diff-rs` from github.com hit a different IP than the one
captured at coronarium startup — github.com's GeoDNS rotates.
coronarium does support hostname re-resolution via
`--dns-refresh-interval <secs>`, which additively populates the
BPF map; enable it with a 30s interval on Linux + Windows.

Also reorder `file.deny`: coronarium kernel-blocks only the first
eight prefixes (`file.deny has more than 8 entries — remaining
are audit-tagged only`). Promote the most catastrophic-on-leak
paths (shadow / sudoers / .ssh / .aws / .gnupg / .netrc /
.docker / docker.sock) into the kernel-enforced slots; the rest
stay in the policy as audit-only tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
bokuweb added a commit to bokuweb/sakimori that referenced this pull request May 10, 2026
Hostname-based `network.allow` rules are the dominant CI usage
(github.com, registry.npmjs.org, index.crates.io, …), and every
one of those hostnames is round-robin DNS. With the previous
default of 0 (refresh disabled), a supervised run that resolves
github.com at startup, then a few seconds later does a second
connect that DNS-resolves to a different IP, would have the
second connect denied with `Operation not permitted` from the
cgroup eBPF hook — a footgun copy-pasters of the README CI
example all hit the moment they switched to `mode: block`.

Default to 15s instead. Entries are written additively (never
removed), and 15s covers a typical short-running CI matrix
without piling up many refreshes; users who really do have an
IP-only allow list can opt out with `--dns-refresh-interval 0`.

Refs reg-viz/reg-cli#650 — the downstream
adopter that surfaced this default.

Signed-off-by: bokuweb <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
bokuweb added a commit to bokuweb/sakimori that referenced this pull request May 10, 2026
`Command::new` on Windows uses CreateProcess, which only matches
`<name>.exe` — it does not consult PATHEXT, so a bare `pnpm`
(installed by pnpm/action-setup as `pnpm.cmd`), `yarn`, `npm`,
or any other `.cmd` shim fails to spawn with `program not found`,
even though the user's shell finds it fine.

We already had `resolve_program()` (a `where`-backed lookup) used
for installing firewall block rules; route the supervised-run
spawn through the same helper so `pnpm test`, `yarn build`, etc.
work without users having to wrap with `cmd /c` or hand-resolve
absolute paths.

Surfaced from a downstream consumer in
reg-viz/reg-cli#650.

Signed-off-by: bokuweb <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
bokuweb and others added 3 commits May 10, 2026 15:51
bokuweb/coronarium v0.30.0 ships #41, which makes 15s the default
for `--dns-refresh-interval`, so the explicit `--dns-refresh-interval 30`
we needed against v0.29.0 is now redundant. Comment updated to
reflect the new behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
Two failure modes from the first block-mode run with the new
default-deny policy:

1. Linux: `git clone` died with `git-remote-https died of signal 9`
   on its first connect. libcurl (git's HTTPS backend) probes
   `$HOME/.netrc` on every HTTPS request for credential lookup,
   and coronarium's kernel-side LSM hook matches on the path
   string before filesystem resolution — so the deny fires (and
   sends SIGKILL) even though `.netrc` doesn't exist on a fresh
   runner. Same probe happens from pnpm and cargo's HTTPS code
   paths. Drop `.netrc` from `file.deny`; the deny had near-zero
   value (file is empty/absent) but breaks every HTTPS call.

2. Windows: 51 tests passed but coronarium-win exited non-zero
   with `policy violation: 417360 events denied`. coronarium-win
   itself warns `network.default=deny on Windows is audit-only`
   — Windows Defender Firewall has no clean default-outbound-deny
   primitive, so coronarium-win tags every TLS connect as denied
   in the log without actually blocking it, and `--mode block`'s
   `denied > 0 → exit 1` rule then fails tests that were never
   blocked. Force `--mode audit` on the Windows run and rely on
   Linux for real enforcement.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
The first green-tests Linux block-mode run still failed with
`policy violation: 7925 events denied`. Audit log shows 57 of
those are TCP connects from `node` to `104.16.x.34` and
`2606:4700::*` — i.e. Cloudflare anycast IPs. registry.npmjs.org
is fronted by Cloudflare and pnpm picks IPs across the /12 that
coronarium's 15s DNS refresh can't enumerate (anycast advertises
the same prefix from many POPs; each DNS query returns a fresh
small subset).

Whitelist Cloudflare's published v4/v6 ranges
(https://www.cloudflare.com/ips/) so the CDN-fronted downloads
work. This widens the egress allowance to "any service on the
same Cloudflare POP", but that is the same effective blast
radius we already accepted by allowing the hostname.

(The remaining 3 denied opens with `filename=""` look like a
coronarium-side bug — empty path strings fall through allow checks
and hit `default: deny`. Will follow up upstream; not a blocker
here.)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
bokuweb added a commit to bokuweb/sakimori that referenced this pull request May 10, 2026
The eBPF ringbuf occasionally emits open events with an empty
`filename` — typically anonymous mmaps, deleted files, or
memfd-style opens where the kernel can't recover an absolute path.
With `file.default: deny`, those would fall through every allow
prefix check and get tagged as denied, inflating `stats.denied`
and tripping block-mode exits on runs where nothing real was
blocked.

reg-viz/reg-cli#650 hit this immediately after switching to
`default: deny`: 3 of 7925 denied opens had `filename: ""`,
emitted by `bash` and `corepack` during normal startup. Once
the network-side denies were resolved, the empty-path opens
would have been the lone remaining cause of CI failure.

Empty isn't actually a path we can meaningfully police — there
is nothing to protect on it. Treat as not-denied at the matcher
boundary.

Signed-off-by: bokuweb <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Switching the network pillar to `default: deny` failed in two ways:

1. With hostname allows only, pnpm install hit `node → 104.16.x.34`
   Cloudflare anycast IPs that the 15s DNS refresh hadn't yet
   added to coronarium's BPF map (registry.npmjs.org is a
   Cloudflare CDN; queries return a fresh small subset of a /12).

2. With Cloudflare's full published CIDR ranges added, coronarium
   refused to start: `bpf_map_update_elem failed: Argument list
   too long`. The userspace MAX_CIDR_EXPANSION caps each CIDR at
   65536 entries, but the BPF map itself is
   `HashMap::with_max_entries(1024)` — 64× smaller — so even one
   /16 plus the existing rules overflows.

Until coronarium grows an LPM-trie-backed prefix map, network
whitelisting isn't feasible for CDN-fronted CI workloads. Drop
to `default: allow` and rely on `process.deny_exec` (basename
block on curl / wget / nc / ssh / scp / rsync / …) and
`file.deny` (kernel-blocked credentials paths) for the actual
enforcement — both pillars work as advertised today.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
bokuweb and others added 2 commits May 10, 2026 16:46
Previous run still failed `policy violation: 7016 events denied
in block mode` even after dropping the network whitelist. The
JSON log's sample set didn't include any of the denied events
(coronarium's sample buffer fills early and misses the bulk of
later events), but the most likely cause was the audit-only
deny entries past slot 8 — `/root/.ssh`, `/root/.aws`,
`/root/.gnupg`, `/etc/ssh/ssh_host`, `/etc/sudoers.d`,
`/var/run/secrets`, and four Windows-shaped entries — getting
hit by libcurl / libgit2 / pnpm probes (~/.ssh/known_hosts and
similar) on every HTTPS connect during the 800-package
pnpm install.

Past-slot-8 entries don't actually block (kernel-side cap is 8)
but they DO increment `stats.denied`, which fails block mode.
Net effect: audit-only deny rules are pure footgun — no
protection, certain CI failure.

Trim to exactly 8 entries, all targeting `/home/runner/...` for
the runner user's home (or `/etc/...` for system credentials).
The `C:\Users\runneradmin\...` Windows duplicates are also
removed because the same policy file is consumed on the Windows
runner, where coronarium-win does not have a counter-overflow
issue but a counter-tag would still create noise; if Windows
needs deny rules we'll add a separate policy file later.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
Previous run still hit `policy violation: 7011 events denied` even
with the deny list trimmed to exactly 8 kernel-blocked entries.
coronarium's sample buffer is biased toward startup events (192
samples of 92152 observed, none with denied=true), so the JSON
log doesn't tell us *which* paths were denied — but the count and
the fact tests still pass (no SIGKILL) point at one cause:

The supervised process opens many anonymous file descriptors —
`pipe:[123]`, `anon_inode:[eventfd]`, `socket:[N]`, `memfd:…` —
whose "paths" aren't real filesystem paths. None match `/lib/`,
`/etc/`, etc. prefixes, so under `default: deny` they fall
through and increment `stats.denied`. pnpm install of ~800
packages spawns thousands of such fds and the run fails block
mode every time. The matcher already has a sibling guard for
empty paths (#42 in v0.31.0); a follow-up to also skip non-
absolute paths would let `default: deny` work, but that's
upstream work.

For now, switch the file pillar back to `default: allow`. The 8
kernel-blocked deny prefixes still fire SIGKILL on attempted
credential reads regardless of the default verdict — that's the
actual defence-in-depth here, and it's intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
@bokuweb bokuweb merged commit 018ee87 into main May 10, 2026
11 checks passed
@bokuweb bokuweb deleted the claude/wizardly-hopper-aa18eb branch May 10, 2026 08:01
bokuweb added a commit to bokuweb/sakimori that referenced this pull request May 10, 2026
Before: a wide CIDR (e.g. Cloudflare's /12) made the supervisor
fail at eBPF attach time with the cryptic

    Error: eBPF programs failed to attach in block mode; refusing
    to run unprotected. attaching programs: `bpf_map_update_elem`
    failed: Argument list too long (os error 7)

…which gives no hint that the cause is the policy itself
exceeding the BPF map size. The userspace warning chain even
suggested 65536 entries per CIDR were fine, while the actual
NET4 / NET6 maps are sized for 1024 each.

Now: pre-flight every resolved (addr, port) pair against the
mirrored `BPF_NET_MAP_CAPACITY` constant before touching the
maps, and bail early with

    Error: policy network rules expand to 460000 IPv4 (addr,port)
    pairs, but the eBPF NET4 map is sized for 1024. Common
    cause: a wide CIDR like Cloudflare's /12s — coronarium
    enumerates every host into the map and CDN ranges overflow
    it quickly. Use hostname rules (kept fresh by
    --dns-refresh-interval) or narrower CIDRs.

The per-insert path also gets `with_context` so any future
overflow that slips past pre-flight surfaces with the offending
addr:port and the capacity number, not just `Argument list too
long`.

Also annotates `MAX_CIDR_EXPANSION` so future readers know its
65536 cap is much larger than the BPF map can hold.

Surfaced from a downstream block-mode policy in
reg-viz/reg-cli#650.

Signed-off-by: bokuweb <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant