ci: audit pnpm test with bokuweb/coronarium#650
Merged
Conversation
Wraps the `pnpm test` step in `bokuweb/coronarium@v0` (mode: audit) on Linux + Windows so every exec/open/connect made during the CLI + library integration tests is recorded. Nothing is blocked — the .github/coronarium.yml policy is default-allow on both network and file. macOS is left unsupervised because coronarium's supervised run is Linux/Windows only. Reports are uploaded as `coronarium-report-<os>` artifacts and, on pull requests, the Linux run is mirrored into a PR comment via bokuweb/coronarium/comment@v0. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
sudo always replaces PATH with secure_path, so `sudo -E $CORONARIUM_BIN run -- pnpm test` failed with `spawning pnpm: No such file or directory`. Wrap with `env "PATH=$PATH"` so the runner user's PATH reaches the supervised child. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
coronarium report
events by kind
📊 Open the full HTML report locallyrm -rf /tmp/coronarium-25623378865 && gh run download 25623378865 -R reg-viz/reg-cli -n coronarium-report-ubuntu-latest -D /tmp/coronarium-25623378865 && (open /tmp/coronarium-25623378865/coronarium-report.html 2>/dev/null || xdg-open /tmp/coronarium-25623378865/coronarium-report.html 2>/dev/null || echo "open file:///tmp/coronarium-25623378865/coronarium-report.html")Requires the |
4 tasks
`pnpm` on Windows is a `.cmd` shim, but Rust's
`Command::new("pnpm")` does not honour PATHEXT and only finds
`pnpm.exe`, so coronarium-win failed to spawn the supervised
child with `program not found`. Wrap with `cmd /c` so the shim
gets resolved.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: bokuweb <[email protected]>
3 tasks
Audit run (5518 events) showed nothing suspicious — only node / pnpm / corepack stack opens (libc, hostedtoolcache, .cache/node, /proc, /sys, /tmp, /var/lib/waagent for the Azure VM Agent) plus PATH-search probes for `sh`. Zero connect events: pnpm install runs in an earlier unsupervised step, and the wasm CLI tests operate purely on local fixture images. Tighten the policy accordingly: - network: default deny (no allow list). 0 connects observed, so any new outbound during tests is a genuine signal worth failing CI on. - file: default allow + deny-list of sensitive paths (shadow/sudoers/ssh-host-keys/.ssh/.aws/.gnupg/.netrc/ .docker/docker.sock + Windows equivalents). A whitelist on disk would be both noisy and brittle for a node + pnpm runtime. - process.deny_exec: basename block of typical exfil / lateral- movement tools (curl, wget, nc, ssh, scp, rsync, …). Tests only need node + pnpm + sh; anything on this list appearing means something new is running inside the supervised step. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
Restructures the matrix job so every `run:` step (corepack enable, build-ui.sh, cargo test, pnpm install, pnpm build, pnpm test) runs inside a single coronarium-supervised invocation on Linux + Windows. macOS keeps the unsupervised path. The policy is now true default-deny on all three pillars: - network: explicit allow for 127.0.0.53:53 (systemd-resolved stub), github.com / codeload.github.com / objects.githubusercontent.com (build-ui.sh git clone), registry.npmjs.org (pnpm install), and index.crates.io / static.crates.io / crates.io (cargo deps). Anything else fails the connect at the cgroup eBPF layer. - file: default deny + a broad FHS allow list (/bin /sbin /lib /lib64 /usr /etc /proc /sys /dev /tmp /run /var /opt /snap /home + C:\ on Windows), with a sensitive-path deny overlay (shadow / sudoers / ssh host keys / .ssh / .aws / .gnupg / .netrc / .docker / docker.sock / /var/run/secrets, plus Windows equivalents). Deny wins over allow, so the credential paths stay locked even though /etc and /home are on the allow list. - process.deny_exec: basename-match block on curl / wget / nc / ncat / socat / ssh / scp / sftp / rsync / telnet / ftp. Tests legitimately need node, pnpm, sh, bash, git, cargo, vite — none of these tools. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
The first block-mode run failed because cargo's git fetch of `image-diff-rs` from github.com hit a different IP than the one captured at coronarium startup — github.com's GeoDNS rotates. coronarium does support hostname re-resolution via `--dns-refresh-interval <secs>`, which additively populates the BPF map; enable it with a 30s interval on Linux + Windows. Also reorder `file.deny`: coronarium kernel-blocks only the first eight prefixes (`file.deny has more than 8 entries — remaining are audit-tagged only`). Promote the most catastrophic-on-leak paths (shadow / sudoers / .ssh / .aws / .gnupg / .netrc / .docker / docker.sock) into the kernel-enforced slots; the rest stay in the policy as audit-only tags. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
This was referenced May 10, 2026
bokuweb
added a commit
to bokuweb/sakimori
that referenced
this pull request
May 10, 2026
Hostname-based `network.allow` rules are the dominant CI usage (github.com, registry.npmjs.org, index.crates.io, …), and every one of those hostnames is round-robin DNS. With the previous default of 0 (refresh disabled), a supervised run that resolves github.com at startup, then a few seconds later does a second connect that DNS-resolves to a different IP, would have the second connect denied with `Operation not permitted` from the cgroup eBPF hook — a footgun copy-pasters of the README CI example all hit the moment they switched to `mode: block`. Default to 15s instead. Entries are written additively (never removed), and 15s covers a typical short-running CI matrix without piling up many refreshes; users who really do have an IP-only allow list can opt out with `--dns-refresh-interval 0`. Refs reg-viz/reg-cli#650 — the downstream adopter that surfaced this default. Signed-off-by: bokuweb <[email protected]> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
bokuweb
added a commit
to bokuweb/sakimori
that referenced
this pull request
May 10, 2026
`Command::new` on Windows uses CreateProcess, which only matches `<name>.exe` — it does not consult PATHEXT, so a bare `pnpm` (installed by pnpm/action-setup as `pnpm.cmd`), `yarn`, `npm`, or any other `.cmd` shim fails to spawn with `program not found`, even though the user's shell finds it fine. We already had `resolve_program()` (a `where`-backed lookup) used for installing firewall block rules; route the supervised-run spawn through the same helper so `pnpm test`, `yarn build`, etc. work without users having to wrap with `cmd /c` or hand-resolve absolute paths. Surfaced from a downstream consumer in reg-viz/reg-cli#650. Signed-off-by: bokuweb <[email protected]> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
bokuweb/coronarium v0.30.0 ships #41, which makes 15s the default for `--dns-refresh-interval`, so the explicit `--dns-refresh-interval 30` we needed against v0.29.0 is now redundant. Comment updated to reflect the new behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
Two failure modes from the first block-mode run with the new default-deny policy: 1. Linux: `git clone` died with `git-remote-https died of signal 9` on its first connect. libcurl (git's HTTPS backend) probes `$HOME/.netrc` on every HTTPS request for credential lookup, and coronarium's kernel-side LSM hook matches on the path string before filesystem resolution — so the deny fires (and sends SIGKILL) even though `.netrc` doesn't exist on a fresh runner. Same probe happens from pnpm and cargo's HTTPS code paths. Drop `.netrc` from `file.deny`; the deny had near-zero value (file is empty/absent) but breaks every HTTPS call. 2. Windows: 51 tests passed but coronarium-win exited non-zero with `policy violation: 417360 events denied`. coronarium-win itself warns `network.default=deny on Windows is audit-only` — Windows Defender Firewall has no clean default-outbound-deny primitive, so coronarium-win tags every TLS connect as denied in the log without actually blocking it, and `--mode block`'s `denied > 0 → exit 1` rule then fails tests that were never blocked. Force `--mode audit` on the Windows run and rely on Linux for real enforcement. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
The first green-tests Linux block-mode run still failed with `policy violation: 7925 events denied`. Audit log shows 57 of those are TCP connects from `node` to `104.16.x.34` and `2606:4700::*` — i.e. Cloudflare anycast IPs. registry.npmjs.org is fronted by Cloudflare and pnpm picks IPs across the /12 that coronarium's 15s DNS refresh can't enumerate (anycast advertises the same prefix from many POPs; each DNS query returns a fresh small subset). Whitelist Cloudflare's published v4/v6 ranges (https://www.cloudflare.com/ips/) so the CDN-fronted downloads work. This widens the egress allowance to "any service on the same Cloudflare POP", but that is the same effective blast radius we already accepted by allowing the hostname. (The remaining 3 denied opens with `filename=""` look like a coronarium-side bug — empty path strings fall through allow checks and hit `default: deny`. Will follow up upstream; not a blocker here.) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
2 tasks
bokuweb
added a commit
to bokuweb/sakimori
that referenced
this pull request
May 10, 2026
The eBPF ringbuf occasionally emits open events with an empty `filename` — typically anonymous mmaps, deleted files, or memfd-style opens where the kernel can't recover an absolute path. With `file.default: deny`, those would fall through every allow prefix check and get tagged as denied, inflating `stats.denied` and tripping block-mode exits on runs where nothing real was blocked. reg-viz/reg-cli#650 hit this immediately after switching to `default: deny`: 3 of 7925 denied opens had `filename: ""`, emitted by `bash` and `corepack` during normal startup. Once the network-side denies were resolved, the empty-path opens would have been the lone remaining cause of CI failure. Empty isn't actually a path we can meaningfully police — there is nothing to protect on it. Treat as not-denied at the matcher boundary. Signed-off-by: bokuweb <[email protected]> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Switching the network pillar to `default: deny` failed in two ways: 1. With hostname allows only, pnpm install hit `node → 104.16.x.34` Cloudflare anycast IPs that the 15s DNS refresh hadn't yet added to coronarium's BPF map (registry.npmjs.org is a Cloudflare CDN; queries return a fresh small subset of a /12). 2. With Cloudflare's full published CIDR ranges added, coronarium refused to start: `bpf_map_update_elem failed: Argument list too long`. The userspace MAX_CIDR_EXPANSION caps each CIDR at 65536 entries, but the BPF map itself is `HashMap::with_max_entries(1024)` — 64× smaller — so even one /16 plus the existing rules overflows. Until coronarium grows an LPM-trie-backed prefix map, network whitelisting isn't feasible for CDN-fronted CI workloads. Drop to `default: allow` and rely on `process.deny_exec` (basename block on curl / wget / nc / ssh / scp / rsync / …) and `file.deny` (kernel-blocked credentials paths) for the actual enforcement — both pillars work as advertised today. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
3 tasks
Previous run still failed `policy violation: 7016 events denied in block mode` even after dropping the network whitelist. The JSON log's sample set didn't include any of the denied events (coronarium's sample buffer fills early and misses the bulk of later events), but the most likely cause was the audit-only deny entries past slot 8 — `/root/.ssh`, `/root/.aws`, `/root/.gnupg`, `/etc/ssh/ssh_host`, `/etc/sudoers.d`, `/var/run/secrets`, and four Windows-shaped entries — getting hit by libcurl / libgit2 / pnpm probes (~/.ssh/known_hosts and similar) on every HTTPS connect during the 800-package pnpm install. Past-slot-8 entries don't actually block (kernel-side cap is 8) but they DO increment `stats.denied`, which fails block mode. Net effect: audit-only deny rules are pure footgun — no protection, certain CI failure. Trim to exactly 8 entries, all targeting `/home/runner/...` for the runner user's home (or `/etc/...` for system credentials). The `C:\Users\runneradmin\...` Windows duplicates are also removed because the same policy file is consumed on the Windows runner, where coronarium-win does not have a counter-overflow issue but a counter-tag would still create noise; if Windows needs deny rules we'll add a separate policy file later. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
Previous run still hit `policy violation: 7011 events denied` even with the deny list trimmed to exactly 8 kernel-blocked entries. coronarium's sample buffer is biased toward startup events (192 samples of 92152 observed, none with denied=true), so the JSON log doesn't tell us *which* paths were denied — but the count and the fact tests still pass (no SIGKILL) point at one cause: The supervised process opens many anonymous file descriptors — `pipe:[123]`, `anon_inode:[eventfd]`, `socket:[N]`, `memfd:…` — whose "paths" aren't real filesystem paths. None match `/lib/`, `/etc/`, etc. prefixes, so under `default: deny` they fall through and increment `stats.denied`. pnpm install of ~800 packages spawns thousands of such fds and the run fails block mode every time. The matcher already has a sibling guard for empty paths (#42 in v0.31.0); a follow-up to also skip non- absolute paths would let `default: deny` work, but that's upstream work. For now, switch the file pillar back to `default: allow`. The 8 kernel-blocked deny prefixes still fire SIGKILL on attempted credential reads regardless of the default verdict — that's the actual defence-in-depth here, and it's intact. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: bokuweb <[email protected]>
4 tasks
bokuweb
added a commit
to bokuweb/sakimori
that referenced
this pull request
May 10, 2026
Before: a wide CIDR (e.g. Cloudflare's /12) made the supervisor
fail at eBPF attach time with the cryptic
Error: eBPF programs failed to attach in block mode; refusing
to run unprotected. attaching programs: `bpf_map_update_elem`
failed: Argument list too long (os error 7)
…which gives no hint that the cause is the policy itself
exceeding the BPF map size. The userspace warning chain even
suggested 65536 entries per CIDR were fine, while the actual
NET4 / NET6 maps are sized for 1024 each.
Now: pre-flight every resolved (addr, port) pair against the
mirrored `BPF_NET_MAP_CAPACITY` constant before touching the
maps, and bail early with
Error: policy network rules expand to 460000 IPv4 (addr,port)
pairs, but the eBPF NET4 map is sized for 1024. Common
cause: a wide CIDR like Cloudflare's /12s — coronarium
enumerates every host into the map and CDN ranges overflow
it quickly. Use hostname rules (kept fresh by
--dns-refresh-interval) or narrower CIDRs.
The per-insert path also gets `with_context` so any future
overflow that slips past pre-flight surfaces with the offending
addr:port and the capacity number, not just `Argument list too
long`.
Also annotates `MAX_CIDR_EXPANSION` so future readers know its
65536 cap is much larger than the BPF map can hold.
Surfaced from a downstream block-mode policy in
reg-viz/reg-cli#650.
Signed-off-by: bokuweb <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.github/coronarium.yml— a default-allow audit policy (no blocking).ci.yml, wraps thepnpm teststep on Linux and Windows withbokuweb/coronarium@v0so everyexec/open/connectsyscall made by the CLI + library integration tests gets recorded (eBPF on Linux, ETW on Windows). macOS stays unsupervised — coronarium's supervised mode is Linux/Windows only.coronarium-report.html+coronarium.log.jsonas a per-OS artifact, and on PRs upserts a single comment with the Linux report viabokuweb/coronarium/comment@v0.pull-requests: writeso the comment action can post.mode: auditmeans nothing fails because of coronarium — this just gives us per-PR visibility into what the test step actually does. Once the audit signal is trusted we can flip individual deny rules on (or move tomode: block) in a follow-up.Test plan
ubuntu-latest,macos-latest,windows-latestcoronarium-report-ubuntu-latest/coronarium-report-windows-latestartifacts are uploadedcoronarium-reportcomment appears with the Linux verdict countspnpm testruntime/output🤖 Generated with Claude Code