Executive Summary
Overall 24h health for github/gh-aw is mostly healthy with a recurring failure tail. The Sentry spans dataset is well-populated and successful runs dominate (100+ spans carried gh-aw.run.status:success, hitting the query page limit). However, 53 failure spans across 15 distinct traces and 9 workflows reported gh-aw.run.status:failure in the window, concentrated in two recurring workflows: Contribution Check and Issue Monster.
No cancelled, timed_out, or error run statuses were observed, and no exporter/auth failures were detectable. Sentry auth (whoami) and ingestion are working; correlation attributes (gh-aw.workflow.name, gh-aw.run.id, gh-aw.engine.id, release) are present and healthy.
Two telemetry caveats temper the failure read:
gh-aw.run.status is emitted per-span/per-step, not once per run. A single failed-run trace contains a mix of failure, success, and unset spans (verified in trace f6b4fa4e...), so a failure span confirms at least one failed step, not necessarily whole-run failure.
span.status and gen_ai.response.finish_reasons are not queryable in Sentry (has: returns zero), so truncation/runaway-token outcomes are inconclusive, not confirmed-clean.
Top Reliability Findings
| Priority |
Workflow |
Problem |
Evidence |
Next Action |
| P1 |
Contribution Check |
Recurring step failures (gh-aw.run.status:failure) |
5 distinct failed traces, 20 failure spans; top failing gen_ai span 509,415 ms (~8.5 min) in trace f6b4fa4e... |
Inspect the failing gen_ai step's run logs for run 27311...-era Contribution Check; recurring → investigate workflow logic, not a one-off |
| P1 |
Issue Monster |
Recurring step failures (gh-aw.run.status:failure) |
3 distinct failed traces, 12 failure spans; top failing gen_ai span 170,528 ms (~2.8 min), run 27275153807, trace f7bcadc4... |
Triage the long failing step in run 27275153807; confirm whether failures are deterministic |
| P3 |
(cross-cutting) |
span.status absent in Sentry; gen_ai.response.finish_reasons not queryable |
has:span.status → 0 results; has:gen_ai.response.finish_reasons → 0 results, although emit-side always sets them |
Add a queryable scalar (e.g. gen_ai.response.finish_reason string) and verify OTLP status.code→Sentry mapping |
| P3 |
(datasets) |
errors and logs datasets empty for 24h |
list_events on both datasets → "No results found" |
Confirm whether error/log export is intended; if so, no action — otherwise wire up error/log signals |
| P4 |
Daily Agent of the Day Blog Writer |
Single very long gen_ai span (latency outlier) |
gen_ai span 2,806,011 ms (~46.8 min), trace 223d8593... (no run.status captured on the span) |
Monitor only — single outlier, not yet systemic; confirm against finish-reason once queryable |
Lower-signal one-off failures (1–2 spans each, not yet recurring): [aw] Failure Investigator (6h), Test Quality Sentinel, PR Sous Chef, Daily CLI Tools Exploratory Tester, Semantic Function Refactoring, Daily AW Cross-Repo Compile Check.
Representative Traces
View representative traces
Operational failure — Contribution Check (trace continuity verified: 30 spans, single workflow, gen_ai + http.server + default ops):
- Trace:
https://github.sentry.io/explore/traces/trace/f6b4fa4e694d65c964fc5d03fd314fcb
- Top failing span
6c04869e7e92d441 — gen_ai, 509,415 ms, gh-aw.run.status:failure
- Same trace also contains
success and unset-status spans (basis for the per-step semantic caveat above)
Operational failure — Issue Monster:
- Trace:
https://github.sentry.io/explore/traces/trace/f7bcadc4c67fe1607baf097fce855ff5
- Top failing span
20ce9432f0b1672b — gen_ai, 170,528 ms, gh-aw.run.status:failure, gh-aw.run.id:27275153807
Latency outlier — Daily Agent of the Day Blog Writer:
- Trace:
https://github.sentry.io/explore/traces/trace/223d85932a0a6f33206a9e2eff0d5906
- Span
d203b12052086798 — gen_ai, 2,806,011 ms (~46.8 min)
All distinct failed traces (15): 12788ba5..., 2d9a4080..., 53bc7992..., 5b73ab11..., 6b3677a9..., 6b70f9c2..., 6f0720af..., 7fc066d7..., 8ba29072..., 90d65364..., ca273d5e..., da872c80..., dd97a2d9..., f6b4fa4e..., f7bcadc4....
Recommendations
- Triage the two recurring failers first (smallest useful fix). Pull run logs for Contribution Check (trace
f6b4fa4e...) and Issue Monster (run 27275153807, trace f7bcadc4...) — these recur across 5 and 3 distinct traces respectively and are the only clearly systemic, user-visible failures.
- Make truncation observable.
gen_ai.response.finish_reasons is emitted as an OTLP array attribute (send_otlp_span.cjs:2099), which Sentry does not index for has:/filter queries. Add a parallel scalar attribute (e.g. gen_ai.response.finish_reason) so length/timeout truncation can be queried and alerted on. Until then, truncation status is inconclusive.
- Verify the OTLP
status.code → Sentry span.status mapping. Emit-side sets statusCode = 2 (ERROR) on failures (send_otlp_span.cjs:1980/2016), yet has:span.status returns nothing in Sentry. Reliability queries currently must rely on gh-aw.run.status; confirm whether the OTLP status is meant to surface as span.status and fix the mapping if so.
- Decide intent for empty
errors/logs datasets. Both are empty for 24h. If error/log export is deliberate, no action; otherwise enabling them would add a second, run-status-independent failure signal.
Notes
View notes
- Telemetry source: Sentry MCP (
list_events), org github, project gh-aw, statsPeriod=24h, region https://us.sentry.io. This MCP build exposes no search_events and no get_trace_details; trace validation used list_events filtered by trace:<id> (skill fallback path).
- Attribute presence verified explicitly:
- Present/healthy:
gh-aw.workflow.name, gh-aw.run.id (matches GitHub run IDs), gh-aw.engine.id (claude/copilot), release (has:release matched; the CLI tool does not print its value).
- Not queryable in Sentry:
span.status (has: → 0), gen_ai.response.finish_reasons (has: and :length → 0).
- Field-name clarifications vs. the playbook: identity is
gh-aw.workflow.name (not gh_aw.workflow_name); engine is gh-aw.engine.id (not gh-aw.engine); release maps from resource attr service.version.
- Counting caveat:
gh-aw.run.status appears on multiple spans per trace with mixed values; failure counts are reported as failure-spans and distinct failed-traces, not as confirmed whole-run failures.
- Inconclusive items: timeouts/cancellations (no such
run.status values seen, but finish_reasons:timeout is not queryable); truncation/runaway tokens; the 46.8-min gen_ai outlier (single occurrence — monitored, not escalated).
- **No fabricated (redacted) all counts, durations, trace IDs, and run IDs above come directly from
list_events results.
References:
Generated by 🚨 Daily Reliability Review · 180.5 AIC · ⌖ 13.4 AIC · ⊞ 5.6K · ◷
Executive Summary
Overall 24h health for
github/gh-awis mostly healthy with a recurring failure tail. The Sentry spans dataset is well-populated and successful runs dominate (100+ spans carriedgh-aw.run.status:success, hitting the query page limit). However, 53 failure spans across 15 distinct traces and 9 workflows reportedgh-aw.run.status:failurein the window, concentrated in two recurring workflows: Contribution Check and Issue Monster.No
cancelled,timed_out, orerrorrun statuses were observed, and no exporter/auth failures were detectable. Sentry auth (whoami) and ingestion are working; correlation attributes (gh-aw.workflow.name,gh-aw.run.id,gh-aw.engine.id,release) are present and healthy.Two telemetry caveats temper the failure read:
gh-aw.run.statusis emitted per-span/per-step, not once per run. A single failed-run trace contains a mix offailure,success, and unset spans (verified in tracef6b4fa4e...), so afailurespan confirms at least one failed step, not necessarily whole-run failure.span.statusandgen_ai.response.finish_reasonsare not queryable in Sentry (has:returns zero), so truncation/runaway-token outcomes are inconclusive, not confirmed-clean.Top Reliability Findings
gh-aw.run.status:failure)gen_aispan 509,415 ms (~8.5 min) in tracef6b4fa4e...gen_aistep's run logs for run27311...-era Contribution Check; recurring → investigate workflow logic, not a one-offgh-aw.run.status:failure)gen_aispan 170,528 ms (~2.8 min), run27275153807, tracef7bcadc4...27275153807; confirm whether failures are deterministicspan.statusabsent in Sentry;gen_ai.response.finish_reasonsnot queryablehas:span.status→ 0 results;has:gen_ai.response.finish_reasons→ 0 results, although emit-side always sets themgen_ai.response.finish_reasonstring) and verify OTLPstatus.code→Sentry mappingerrorsandlogsdatasets empty for 24hlist_eventson both datasets → "No results found"gen_aispan (latency outlier)gen_aispan 2,806,011 ms (~46.8 min), trace223d8593...(norun.statuscaptured on the span)Lower-signal one-off failures (1–2 spans each, not yet recurring):
[aw] Failure Investigator (6h),Test Quality Sentinel,PR Sous Chef,Daily CLI Tools Exploratory Tester,Semantic Function Refactoring,Daily AW Cross-Repo Compile Check.Representative Traces
View representative traces
Operational failure — Contribution Check (trace continuity verified: 30 spans, single workflow,
gen_ai+http.server+defaultops):https://github.sentry.io/explore/traces/trace/f6b4fa4e694d65c964fc5d03fd314fcb6c04869e7e92d441—gen_ai, 509,415 ms,gh-aw.run.status:failuresuccessand unset-status spans (basis for the per-step semantic caveat above)Operational failure — Issue Monster:
https://github.sentry.io/explore/traces/trace/f7bcadc4c67fe1607baf097fce855ff520ce9432f0b1672b—gen_ai, 170,528 ms,gh-aw.run.status:failure,gh-aw.run.id:27275153807Latency outlier — Daily Agent of the Day Blog Writer:
https://github.sentry.io/explore/traces/trace/223d85932a0a6f33206a9e2eff0d5906d203b12052086798—gen_ai, 2,806,011 ms (~46.8 min)All distinct failed traces (15):
12788ba5...,2d9a4080...,53bc7992...,5b73ab11...,6b3677a9...,6b70f9c2...,6f0720af...,7fc066d7...,8ba29072...,90d65364...,ca273d5e...,da872c80...,dd97a2d9...,f6b4fa4e...,f7bcadc4....Recommendations
f6b4fa4e...) and Issue Monster (run27275153807, tracef7bcadc4...) — these recur across 5 and 3 distinct traces respectively and are the only clearly systemic, user-visible failures.gen_ai.response.finish_reasonsis emitted as an OTLP array attribute (send_otlp_span.cjs:2099), which Sentry does not index forhas:/filter queries. Add a parallel scalar attribute (e.g.gen_ai.response.finish_reason) so length/timeout truncation can be queried and alerted on. Until then, truncation status is inconclusive.status.code→ Sentryspan.statusmapping. Emit-side setsstatusCode = 2 (ERROR)on failures (send_otlp_span.cjs:1980/2016), yethas:span.statusreturns nothing in Sentry. Reliability queries currently must rely ongh-aw.run.status; confirm whether the OTLP status is meant to surface asspan.statusand fix the mapping if so.errors/logsdatasets. Both are empty for 24h. If error/log export is deliberate, no action; otherwise enabling them would add a second, run-status-independent failure signal.Notes
View notes
list_events), orggithub, projectgh-aw,statsPeriod=24h, regionhttps://us.sentry.io. This MCP build exposes nosearch_eventsand noget_trace_details; trace validation usedlist_eventsfiltered bytrace:<id>(skill fallback path).gh-aw.workflow.name,gh-aw.run.id(matches GitHub run IDs),gh-aw.engine.id(claude/copilot),release(has:releasematched; the CLI tool does not print its value).span.status(has:→ 0),gen_ai.response.finish_reasons(has:and:length→ 0).gh-aw.workflow.name(notgh_aw.workflow_name); engine isgh-aw.engine.id(notgh-aw.engine); release maps from resource attrservice.version.gh-aw.run.statusappears on multiple spans per trace with mixed values; failure counts are reported as failure-spans and distinct failed-traces, not as confirmed whole-run failures.run.statusvalues seen, butfinish_reasons:timeoutis not queryable); truncation/runaway tokens; the 46.8-mingen_aioutlier (single occurrence — monitored, not escalated).list_eventsresults.References: