Skip to content

[reliability] Daily Reliability Review - 2026-06-10 #38460

Description

@github-actions

Executive Summary

Overall 24h health for github/gh-aw is mostly healthy with a recurring failure tail. The Sentry spans dataset is well-populated and successful runs dominate (100+ spans carried gh-aw.run.status:success, hitting the query page limit). However, 53 failure spans across 15 distinct traces and 9 workflows reported gh-aw.run.status:failure in the window, concentrated in two recurring workflows: Contribution Check and Issue Monster.

No cancelled, timed_out, or error run statuses were observed, and no exporter/auth failures were detectable. Sentry auth (whoami) and ingestion are working; correlation attributes (gh-aw.workflow.name, gh-aw.run.id, gh-aw.engine.id, release) are present and healthy.

Two telemetry caveats temper the failure read:

  • gh-aw.run.status is emitted per-span/per-step, not once per run. A single failed-run trace contains a mix of failure, success, and unset spans (verified in trace f6b4fa4e...), so a failure span confirms at least one failed step, not necessarily whole-run failure.
  • span.status and gen_ai.response.finish_reasons are not queryable in Sentry (has: returns zero), so truncation/runaway-token outcomes are inconclusive, not confirmed-clean.

Top Reliability Findings

Priority Workflow Problem Evidence Next Action
P1 Contribution Check Recurring step failures (gh-aw.run.status:failure) 5 distinct failed traces, 20 failure spans; top failing gen_ai span 509,415 ms (~8.5 min) in trace f6b4fa4e... Inspect the failing gen_ai step's run logs for run 27311...-era Contribution Check; recurring → investigate workflow logic, not a one-off
P1 Issue Monster Recurring step failures (gh-aw.run.status:failure) 3 distinct failed traces, 12 failure spans; top failing gen_ai span 170,528 ms (~2.8 min), run 27275153807, trace f7bcadc4... Triage the long failing step in run 27275153807; confirm whether failures are deterministic
P3 (cross-cutting) span.status absent in Sentry; gen_ai.response.finish_reasons not queryable has:span.status → 0 results; has:gen_ai.response.finish_reasons → 0 results, although emit-side always sets them Add a queryable scalar (e.g. gen_ai.response.finish_reason string) and verify OTLP status.code→Sentry mapping
P3 (datasets) errors and logs datasets empty for 24h list_events on both datasets → "No results found" Confirm whether error/log export is intended; if so, no action — otherwise wire up error/log signals
P4 Daily Agent of the Day Blog Writer Single very long gen_ai span (latency outlier) gen_ai span 2,806,011 ms (~46.8 min), trace 223d8593... (no run.status captured on the span) Monitor only — single outlier, not yet systemic; confirm against finish-reason once queryable

Lower-signal one-off failures (1–2 spans each, not yet recurring): [aw] Failure Investigator (6h), Test Quality Sentinel, PR Sous Chef, Daily CLI Tools Exploratory Tester, Semantic Function Refactoring, Daily AW Cross-Repo Compile Check.

Representative Traces

View representative traces

Operational failure — Contribution Check (trace continuity verified: 30 spans, single workflow, gen_ai + http.server + default ops):

  • Trace: https://github.sentry.io/explore/traces/trace/f6b4fa4e694d65c964fc5d03fd314fcb
  • Top failing span 6c04869e7e92d441gen_ai, 509,415 ms, gh-aw.run.status:failure
  • Same trace also contains success and unset-status spans (basis for the per-step semantic caveat above)

Operational failure — Issue Monster:

  • Trace: https://github.sentry.io/explore/traces/trace/f7bcadc4c67fe1607baf097fce855ff5
  • Top failing span 20ce9432f0b1672bgen_ai, 170,528 ms, gh-aw.run.status:failure, gh-aw.run.id:27275153807

Latency outlier — Daily Agent of the Day Blog Writer:

  • Trace: https://github.sentry.io/explore/traces/trace/223d85932a0a6f33206a9e2eff0d5906
  • Span d203b12052086798gen_ai, 2,806,011 ms (~46.8 min)

All distinct failed traces (15): 12788ba5..., 2d9a4080..., 53bc7992..., 5b73ab11..., 6b3677a9..., 6b70f9c2..., 6f0720af..., 7fc066d7..., 8ba29072..., 90d65364..., ca273d5e..., da872c80..., dd97a2d9..., f6b4fa4e..., f7bcadc4....

Recommendations

  1. Triage the two recurring failers first (smallest useful fix). Pull run logs for Contribution Check (trace f6b4fa4e...) and Issue Monster (run 27275153807, trace f7bcadc4...) — these recur across 5 and 3 distinct traces respectively and are the only clearly systemic, user-visible failures.
  2. Make truncation observable. gen_ai.response.finish_reasons is emitted as an OTLP array attribute (send_otlp_span.cjs:2099), which Sentry does not index for has:/filter queries. Add a parallel scalar attribute (e.g. gen_ai.response.finish_reason) so length/timeout truncation can be queried and alerted on. Until then, truncation status is inconclusive.
  3. Verify the OTLP status.code → Sentry span.status mapping. Emit-side sets statusCode = 2 (ERROR) on failures (send_otlp_span.cjs:1980/2016), yet has:span.status returns nothing in Sentry. Reliability queries currently must rely on gh-aw.run.status; confirm whether the OTLP status is meant to surface as span.status and fix the mapping if so.
  4. Decide intent for empty errors/logs datasets. Both are empty for 24h. If error/log export is deliberate, no action; otherwise enabling them would add a second, run-status-independent failure signal.

Notes

View notes
  • Telemetry source: Sentry MCP (list_events), org github, project gh-aw, statsPeriod=24h, region https://us.sentry.io. This MCP build exposes no search_events and no get_trace_details; trace validation used list_events filtered by trace:<id> (skill fallback path).
  • Attribute presence verified explicitly:
    • Present/healthy: gh-aw.workflow.name, gh-aw.run.id (matches GitHub run IDs), gh-aw.engine.id (claude/copilot), release (has:release matched; the CLI tool does not print its value).
    • Not queryable in Sentry: span.status (has: → 0), gen_ai.response.finish_reasons (has: and :length → 0).
    • Field-name clarifications vs. the playbook: identity is gh-aw.workflow.name (not gh_aw.workflow_name); engine is gh-aw.engine.id (not gh-aw.engine); release maps from resource attr service.version.
  • Counting caveat: gh-aw.run.status appears on multiple spans per trace with mixed values; failure counts are reported as failure-spans and distinct failed-traces, not as confirmed whole-run failures.
  • Inconclusive items: timeouts/cancellations (no such run.status values seen, but finish_reasons:timeout is not queryable); truncation/runaway tokens; the 46.8-min gen_ai outlier (single occurrence — monitored, not escalated).
  • **No fabricated (redacted) all counts, durations, trace IDs, and run IDs above come directly from list_events results.

References:

Generated by 🚨 Daily Reliability Review · 180.5 AIC · ⌖ 13.4 AIC · ⊞ 5.6K ·

  • expires on Jun 12, 2026, 3:27 PM UTC-08:00

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions