You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix the daily AIC guardrail experience first — it caused 3 of 7 in-window failures and is the dominant red-build source. Two workflows hard-failed at activation because the daily AI-credits guardrail tripped, and one (Code Simplifier) burned 12.3M tokens / 4,219 AIC / 244 turns in a single run before the agent job crashed. The guardrail tripping is the system working as designed, but it surfaces as opaque workflow failures and is already tracked by #38624 and #38645 — no new tracking needed there. The one genuinely uncovered fix is the Code Simplifier runaway (see sub-issue).
Note on data freshness: the deterministic pre-fetch payload (prefetch.json, generated 2026-06-12 08:14 UTC) listed failed_run_ids: [] and failures: [], missing all 7 in-window failures. Findings below were recovered by querying gh run list --status failure directly. The empty prefetch is itself a reliability gap worth fixing.
Test Quality Sentinel: ##[error]Daily workflow AIC guardrail exceeded for Test Quality Sentinel: 5043.04317/5000.
The guardrail behaves correctly, but a tripped guardrail renders as a generic red failure with no distinct status. This is the recurring theme behind #38624 (raise cap) and #38645 (soft pre-cap guard).
Cluster B — Code Simplifier runaway
Engine: GitHub Copilot CLI 1.0.60, model claude-sonnet-4.6, scheduled trigger.
agent job hard-failed; no structured error captured (the process terminated). A single run consumed ~84% of the 5000 daily AIC budget.
Audit flagged: Resource Heavy For Domain (high), Many Iterations (244 turns), ~50% reducible to deterministic steps.
This is distinct from Cluster A: it is a per-PR/scheduled code-fix workflow with a turn/token runaway, not a tripped guardrail. See sub-issue for the proposed fix.
Cluster C — CGO lint-go (out of scope)
CGO is the project's main CI pipeline (validate-yaml, test, build, security scans, lint-go, etc.), not an agentic workflow. Both failures were lint-go exiting 1; all other ~24 jobs passed. Excluded from agentic-workflow remediation; flagged here only for completeness.
Cluster D — Git Simulator push_repo_memory
Run 27397597917: activation, agent, detection, safe_outputs, conclusion all succeeded — the agent reported 4/4 simulator configs PASS and correctly emitted noop. Only the push_repo_memory post-job failed (36s). Single occurrence; likely a transient push/concurrency issue. Monitoring rather than tracking.
audit-diff — no behavioral regressions
Pairwise audit-diff across the failed runs showed has_anomalies: false and anomaly_count: 0 for every pair. The only firewall deltas were expected engine differences (api.anthropic.com vs api.githubcopilot.com + sentry.io). No firewall, MCP, or tooling regression is implicated in any cluster.
Prefetch was empty again (failed_run_ids: [], failures: []) despite 33 failures in the window — recovered via gh run list --status failure. Same reliability gap this report already flagged.
New uncovered fixes filed as sub-issues of this report:
Documentation Unbloat — Git LFS / build:slides (#aw_lfs1): checkout lacks lfs: true, slides PDF is an LFS pointer, docs build exits 1 before the agent runs. Latent in technical-doc-writer, update-astro, visual-regression-checker. Deterministic regression vs 2026-06-11 baseline.
Copilot SDK-driver tool-denial runaway (#aw_deny1): Daily Formal Spec Verifier + Breaking Change Checker hit guard.tool_denials_exceeded (mis-scoped allowlist), burning up to 341 AIC before hard-failing.
Observed but not filed (lower priority / already covered):
Daily Issues Report Generator (27425935620) — exit 127, Node.js missing in the AWF chroot while launching the experimental Python SDK driver. Experimental-driver gap.
Test Quality Sentinel (27420645901) — agent succeeded (26 turns, valid add_comment) but the safe_outputs job failed posting on a GitHub REST API 403 installation rate limit. Transient; consider retry/backoff on safe-output posting.
Out of scope: CGO (×3), CI (×2), Copilot cloud agent (×3, non-gh-aw), Daily Credit Limit Test (Intentionally Broken). Super Linter Report and Daily Agent of the Day Blog Writer failures were not individually root-caused this pass.
Executive summary
Fix the daily AIC guardrail experience first — it caused 3 of 7 in-window failures and is the dominant red-build source. Two workflows hard-failed at
activationbecause the daily AI-credits guardrail tripped, and one (Code Simplifier) burned 12.3M tokens / 4,219 AIC / 244 turns in a single run before theagentjob crashed. The guardrail tripping is the system working as designed, but it surfaces as opaque workflow failures and is already tracked by #38624 and #38645 — no new tracking needed there. The one genuinely uncovered fix is theCode Simplifierrunaway (see sub-issue).Note on data freshness: the deterministic pre-fetch payload (
prefetch.json, generated 2026-06-12 08:14 UTC) listedfailed_run_ids: []andfailures: [], missing all 7 in-window failures. Findings below were recovered by queryinggh run list --status failuredirectly. The empty prefetch is itself a reliability gap worth fixing.Failure cluster table
activationagentlint-gopush_repo_memoryEvidence
Cluster A — AIC guardrail (activation hard-fails)
Both runs failed in the
activationjob (agent never ran;detection/safe_outputsskipped):##[error]Daily workflow AIC guardrail exceeded for PR Code Quality Reviewer: 5959.44416/5000.##[error]Daily workflow AIC guardrail exceeded for Test Quality Sentinel: 5043.04317/5000.The guardrail behaves correctly, but a tripped guardrail renders as a generic red failure with no distinct status. This is the recurring theme behind #38624 (raise cap) and #38645 (soft pre-cap guard).
Cluster B — Code Simplifier runaway
claude-sonnet-4.6, scheduled trigger.agentjob hard-failed; no structured error captured (the process terminated). A single run consumed ~84% of the 5000 daily AIC budget.This is distinct from Cluster A: it is a per-PR/scheduled code-fix workflow with a turn/token runaway, not a tripped guardrail. See sub-issue for the proposed fix.
Cluster C — CGO lint-go (out of scope)
CGO is the project's main CI pipeline (validate-yaml, test, build, security scans, lint-go, etc.), not an agentic workflow. Both failures were
lint-goexiting 1; all other ~24 jobs passed. Excluded from agentic-workflow remediation; flagged here only for completeness.Cluster D — Git Simulator push_repo_memory
Run 27397597917:
activation,agent,detection,safe_outputs,conclusionall succeeded — the agent reported 4/4 simulator configs PASS and correctly emittednoop. Only thepush_repo_memorypost-job failed (36s). Single occurrence; likely a transient push/concurrency issue. Monitoring rather than tracking.audit-diff — no behavioral regressions
Pairwise
audit-diffacross the failed runs showedhas_anomalies: falseandanomaly_count: 0for every pair. The only firewall deltas were expected engine differences (api.anthropic.comvsapi.githubcopilot.com+sentry.io). No firewall, MCP, or tooling regression is implicated in any cluster.Existing issue correlation
ai-credits) — Raise max-ai-credits for Failure Investigator: directly covers the Cluster A guardrail theme.deep-report) — Add a soft pre-cap AI-credits guard to heavy aggregator workflows: covers both Cluster A (graceful pre-cap) and the prevention angle of Cluster B.agentic-workflows) — Closed by this run as a stale transient self-report.No duplicate tracking is created for Clusters A/C/D.
Fix roadmap
Code Simplifierturn/token runaway (Cluster B) — see sub-issue below.activation(Cluster A).ghrecovery.push_repo_memorypost-job failure (Cluster D); track only if it recurs.Sub-issues created
References:
6h Review addendum — 2026-06-12 19:25 UTC
Prefetch was empty again (
failed_run_ids: [],failures: []) despite 33 failures in the window — recovered viagh run list --status failure. Same reliability gap this report already flagged.New uncovered fixes filed as sub-issues of this report:
build:slides(#aw_lfs1): checkout lackslfs: true, slides PDF is an LFS pointer, docs build exits 1 before the agent runs. Latent intechnical-doc-writer,update-astro,visual-regression-checker. Deterministic regression vs 2026-06-11 baseline.Daily Formal Spec Verifier+Breaking Change Checkerhitguard.tool_denials_exceeded(mis-scoped allowlist), burning up to 341 AIC before hard-failing.Observed but not filed (lower priority / already covered):
PR Code Quality Reviewer(×10) +Matt Pocock Skills Reviewer(×7) failed atactivationon the daily AIC guardrail (e.g. 5934.9/5000, 5169.0/5000), turns=0. Already tracked by [perf-improvement] Raise max-ai-credits for Failure Investigator (6h) — meta-monitor blind spot #38624 / [deep-report] Add a soft pre-cap AI-credits guard to heavy aggregator workflows #38645; volume is PR re-trigger churn.add_comment) but thesafe_outputsjob failed posting on a GitHub REST API 403 installation rate limit. Transient; consider retry/backoff on safe-output posting.4d9c6ac(PR Cap Code Simplifier runaways with hard per-run budgets and graceful noop exit #38851, "Cap Code Simplifier runaways..."); no in-window Code Simplifier failure observed. Recommend verifying on the next scheduled run before closing.Daily Credit Limit Test (Intentionally Broken).Super Linter ReportandDaily Agent of the Day Blog Writerfailures were not individually root-caused this pass.References: §27431567219, §27428819181, §27420645901