Skip to content

[aw-failures] [aw] Failure Investigator — 6h Review (2026-06-12 08:14 UTC) #38807

@github-actions

Description

@github-actions

Executive summary

Fix the daily AIC guardrail experience first — it caused 3 of 7 in-window failures and is the dominant red-build source. Two workflows hard-failed at activation because the daily AI-credits guardrail tripped, and one (Code Simplifier) burned 12.3M tokens / 4,219 AIC / 244 turns in a single run before the agent job crashed. The guardrail tripping is the system working as designed, but it surfaces as opaque workflow failures and is already tracked by #38624 and #38645 — no new tracking needed there. The one genuinely uncovered fix is the Code Simplifier runaway (see sub-issue).

Note on data freshness: the deterministic pre-fetch payload (prefetch.json, generated 2026-06-12 08:14 UTC) listed failed_run_ids: [] and failures: [], missing all 7 in-window failures. Findings below were recovered by querying gh run list --status failure directly. The empty prefetch is itself a reliability gap worth fixing.

Failure cluster table

Cluster Workflow(s) Run ID(s) Failing job Root cause Priority Coverage
A — Daily AIC guardrail PR Code Quality Reviewer, Test Quality Sentinel 27393233737, 27393233765 activation Daily AIC guardrail exceeded (5959.4/5000; 5043.0/5000) P1 Tracked: #38624, #38645
B — Agent runaway Code Simplifier 27395179213 agent Agent hard-fail after 12.3M tok / 4,219 AIC / 244 turns / 32.8m P1 Uncovered → sub-issue
C — CI lint (out of scope) CGO 27393478310, 27393440909 lint-go Go lint exit code 1 — non-agentic CI pipeline P2 No action (out of scope)
D — Post-job push Daily Safe Outputs Git Simulator 27397597917 push_repo_memory Agent succeeded (4/4 sim configs PASS); memory-push post-job failed P2 Monitor (single/transient)

Evidence

Cluster A — AIC guardrail (activation hard-fails)

Both runs failed in the activation job (agent never ran; detection/safe_outputs skipped):

  • PR Code Quality Reviewer: ##[error]Daily workflow AIC guardrail exceeded for PR Code Quality Reviewer: 5959.44416/5000.
  • Test Quality Sentinel: ##[error]Daily workflow AIC guardrail exceeded for Test Quality Sentinel: 5043.04317/5000.

The guardrail behaves correctly, but a tripped guardrail renders as a generic red failure with no distinct status. This is the recurring theme behind #38624 (raise cap) and #38645 (soft pre-cap guard).

Cluster B — Code Simplifier runaway
  • Engine: GitHub Copilot CLI 1.0.60, model claude-sonnet-4.6, scheduled trigger.
  • Metrics: 12,306,086 tokens, 4,219.8 AIC, 244 turns, 32.8m wall time, 0 write actions (read-only posture), 670 firewall requests / 0 blocked.
  • agent job hard-failed; no structured error captured (the process terminated). A single run consumed ~84% of the 5000 daily AIC budget.
  • Audit flagged: Resource Heavy For Domain (high), Many Iterations (244 turns), ~50% reducible to deterministic steps.

This is distinct from Cluster A: it is a per-PR/scheduled code-fix workflow with a turn/token runaway, not a tripped guardrail. See sub-issue for the proposed fix.

Cluster C — CGO lint-go (out of scope)

CGO is the project's main CI pipeline (validate-yaml, test, build, security scans, lint-go, etc.), not an agentic workflow. Both failures were lint-go exiting 1; all other ~24 jobs passed. Excluded from agentic-workflow remediation; flagged here only for completeness.

Cluster D — Git Simulator push_repo_memory

Run 27397597917: activation, agent, detection, safe_outputs, conclusion all succeeded — the agent reported 4/4 simulator configs PASS and correctly emitted noop. Only the push_repo_memory post-job failed (36s). Single occurrence; likely a transient push/concurrency issue. Monitoring rather than tracking.

audit-diff — no behavioral regressions

Pairwise audit-diff across the failed runs showed has_anomalies: false and anomaly_count: 0 for every pair. The only firewall deltas were expected engine differences (api.anthropic.com vs api.githubcopilot.com + sentry.io). No firewall, MCP, or tooling regression is implicated in any cluster.

Existing issue correlation

No duplicate tracking is created for Clusters A/C/D.

Fix roadmap

  • P0: None. No P0 failure lacks tracking coverage.
  • P1:
  • P2:
    • Fix the empty deterministic prefetch payload so this workflow does not depend on live gh recovery.
    • Monitor the Git Simulator push_repo_memory post-job failure (Cluster D); track only if it recurs.

Sub-issues created

References:



6h Review addendum — 2026-06-12 19:25 UTC

Prefetch was empty again (failed_run_ids: [], failures: []) despite 33 failures in the window — recovered via gh run list --status failure. Same reliability gap this report already flagged.

New uncovered fixes filed as sub-issues of this report:

  • Documentation Unbloat — Git LFS / build:slides (#aw_lfs1): checkout lacks lfs: true, slides PDF is an LFS pointer, docs build exits 1 before the agent runs. Latent in technical-doc-writer, update-astro, visual-regression-checker. Deterministic regression vs 2026-06-11 baseline.
  • Copilot SDK-driver tool-denial runaway (#aw_deny1): Daily Formal Spec Verifier + Breaking Change Checker hit guard.tool_denials_exceeded (mis-scoped allowlist), burning up to 341 AIC before hard-failing.

Observed but not filed (lower priority / already covered):

References: §27431567219, §27428819181, §27420645901

Generated by 🔍 [aw] Failure Investigator (6h) · 400 AIC · ⌖ 14.2 AIC · ⊞ 5.1K ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions