[reliability] Daily Reliability Review - 2026-06-10

### Executive Summary

Overall 24h health for `github/gh-aw` is **mostly healthy with a recurring failure tail**. The Sentry **spans** dataset is well-populated and successful runs dominate (100+ spans carried `gh-aw.run.status:success`, hitting the query page limit). However, **53 failure spans across 15 distinct traces and 9 workflows** reported `gh-aw.run.status:failure` in the window, concentrated in two recurring workflows: **Contribution Check** and **Issue Monster**.

No `cancelled`, `timed_out`, or `error` run statuses were observed, and no exporter/auth failures were detectable. Sentry auth (`whoami`) and ingestion are working; correlation attributes (`gh-aw.workflow.name`, `gh-aw.run.id`, `gh-aw.engine.id`, `release`) are present and healthy.

Two telemetry caveats temper the failure read:
- **`gh-aw.run.status` is emitted per-span/per-step, not once per run.** A single failed-run trace contains a mix of `failure`, `success`, and unset spans (verified in trace `f6b4fa4e...`), so a `failure` span confirms *at least one failed step*, not necessarily whole-run failure.
- **`span.status` and `gen_ai.response.finish_reasons` are not queryable in Sentry** (`has:` returns zero), so truncation/runaway-token outcomes are **inconclusive**, not confirmed-clean.

### Top Reliability Findings

| Priority | Workflow | Problem | Evidence | Next Action |
| --- | --- | --- | --- | --- |
| P1 | Contribution Check | Recurring step failures (`gh-aw.run.status:failure`) | 5 distinct failed traces, 20 failure spans; top failing `gen_ai` span **509,415 ms (~8.5 min)** in trace `f6b4fa4e...` | Inspect the failing `gen_ai` step's run logs for run `27311...`-era Contribution Check; recurring &rarr; investigate workflow logic, not a one-off |
| P1 | Issue Monster | Recurring step failures (`gh-aw.run.status:failure`) | 3 distinct failed traces, 12 failure spans; top failing `gen_ai` span **170,528 ms (~2.8 min)**, run `27275153807`, trace `f7bcadc4...` | Triage the long failing step in run `27275153807`; confirm whether failures are deterministic |
| P3 | (cross-cutting) | `span.status` absent in Sentry; `gen_ai.response.finish_reasons` not queryable | `has:span.status` &rarr; 0 results; `has:gen_ai.response.finish_reasons` &rarr; 0 results, although emit-side always sets them | Add a queryable scalar (e.g. `gen_ai.response.finish_reason` string) and verify OTLP `status.code`&rarr;Sentry mapping |
| P3 | (datasets) | `errors` and `logs` datasets empty for 24h | `list_events` on both datasets &rarr; "No results found" | Confirm whether error/log export is intended; if so, no action &mdash; otherwise wire up error/log signals |
| P4 | Daily Agent of the Day Blog Writer | Single very long `gen_ai` span (latency outlier) | `gen_ai` span **2,806,011 ms (~46.8 min)**, trace `223d8593...` (no `run.status` captured on the span) | Monitor only &mdash; single outlier, not yet systemic; confirm against finish-reason once queryable |

Lower-signal one-off failures (1&ndash;2 spans each, not yet recurring): `[aw] Failure Investigator (6h)`, `Test Quality Sentinel`, `PR Sous Chef`, `Daily CLI Tools Exploratory Tester`, `Semantic Function Refactoring`, `Daily AW Cross-Repo Compile Check`.

### Representative Traces

<details>
<summary>View representative traces</summary>

**Operational failure &mdash; Contribution Check** (trace continuity verified: 30 spans, single workflow, `gen_ai` + `http.server` + `default` ops):
- Trace: `https://github.sentry.io/explore/traces/trace/f6b4fa4e694d65c964fc5d03fd314fcb`
- Top failing span `6c04869e7e92d441` &mdash; `gen_ai`, **509,415 ms**, `gh-aw.run.status:failure`
- Same trace also contains `success` and unset-status spans (basis for the per-step semantic caveat above)

**Operational failure &mdash; Issue Monster:**
- Trace: `https://github.sentry.io/explore/traces/trace/f7bcadc4c67fe1607baf097fce855ff5`
- Top failing span `20ce9432f0b1672b` &mdash; `gen_ai`, **170,528 ms**, `gh-aw.run.status:failure`, `gh-aw.run.id:27275153807`

**Latency outlier &mdash; Daily Agent of the Day Blog Writer:**
- Trace: `https://github.sentry.io/explore/traces/trace/223d85932a0a6f33206a9e2eff0d5906`
- Span `d203b12052086798` &mdash; `gen_ai`, **2,806,011 ms (~46.8 min)**

All distinct failed traces (15): `12788ba5...`, `2d9a4080...`, `53bc7992...`, `5b73ab11...`, `6b3677a9...`, `6b70f9c2...`, `6f0720af...`, `7fc066d7...`, `8ba29072...`, `90d65364...`, `ca273d5e...`, `da872c80...`, `dd97a2d9...`, `f6b4fa4e...`, `f7bcadc4...`.

</details>

### Recommendations

1. **Triage the two recurring failers first (smallest useful fix).** Pull run logs for **Contribution Check** (trace `f6b4fa4e...`) and **Issue Monster** (run `27275153807`, trace `f7bcadc4...`) &mdash; these recur across 5 and 3 distinct traces respectively and are the only clearly systemic, user-visible failures.
2. **Make truncation observable.** `gen_ai.response.finish_reasons` is emitted as an OTLP **array** attribute (`send_otlp_span.cjs:2099`), which Sentry does not index for `has:`/filter queries. Add a parallel **scalar** attribute (e.g. `gen_ai.response.finish_reason`) so length/timeout truncation can be queried and alerted on. Until then, truncation status is **inconclusive**.
3. **Verify the OTLP `status.code` &rarr; Sentry `span.status` mapping.** Emit-side sets `statusCode = 2 (ERROR)` on failures (`send_otlp_span.cjs:1980/2016`), yet `has:span.status` returns nothing in Sentry. Reliability queries currently must rely on `gh-aw.run.status`; confirm whether the OTLP status is meant to surface as `span.status` and fix the mapping if so.
4. **Decide intent for empty `errors`/`logs` datasets.** Both are empty for 24h. If error/log export is deliberate, no action; otherwise enabling them would add a second, run-status-independent failure signal.

### Notes

<details>
<summary>View notes</summary>

- **Telemetry source:** Sentry MCP (`list_events`), org `github`, project `gh-aw`, `statsPeriod=24h`, region `https://us.sentry.io`. This MCP build exposes **no `search_events` and no `get_trace_details`**; trace validation used `list_events` filtered by `trace:<id>` (skill fallback path).
- **Attribute presence verified explicitly:**
  - Present/healthy: `gh-aw.workflow.name`, `gh-aw.run.id` (matches GitHub run IDs), `gh-aw.engine.id` (`claude`/`copilot`), `release` (`has:release` matched; the CLI tool does not print its value).
  - Not queryable in Sentry: `span.status` (`has:` &rarr; 0), `gen_ai.response.finish_reasons` (`has:` and `:length` &rarr; 0).
  - Field-name clarifications vs. the playbook: identity is `gh-aw.workflow.name` (not `gh_aw.workflow_name`); engine is `gh-aw.engine.id` (not `gh-aw.engine`); release maps from resource attr `service.version`.
- **Counting caveat:** `gh-aw.run.status` appears on multiple spans per trace with mixed values; failure counts are reported as failure-spans and distinct failed-traces, not as confirmed whole-run failures.
- **Inconclusive items:** timeouts/cancellations (no such `run.status` values seen, but `finish_reasons:timeout` is not queryable); truncation/runaway tokens; the 46.8-min `gen_ai` outlier (single occurrence &mdash; monitored, not escalated).
- **No fabricated (redacted) all counts, durations, trace IDs, and run IDs above come directly from `list_events` results.

**References:**
- [&sect;27312594971](https://github.com/github/gh-aw/actions/runs/27312594971) (this review run)
- [&sect;27275153807](https://github.com/github/gh-aw/actions/runs/27275153807) (Issue Monster failed run)

</details>







> Generated by [&#128680; Daily Reliability Review](https://github.com/github/gh-aw/actions/runs/27312594971) &middot; 180.5 AIC &middot; &#8982; 13.4 AIC &middot; &#8862; 5.6K &middot; [&#9719;](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-reliability-review%22&type=issues)
> - [x] expires  on Jun 12, 2026, 3:27 PM UTC-08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[reliability] Daily Reliability Review - 2026-06-10 #38460

Executive Summary

Top Reliability Findings

Representative Traces

Recommendations

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Priority	Workflow	Problem	Evidence	Next Action
P1	Contribution Check	Recurring step failures (`gh-aw.run.status:failure`)	5 distinct failed traces, 20 failure spans; top failing `gen_ai` span 509,415 ms (~8.5 min) in trace `f6b4fa4e...`	Inspect the failing `gen_ai` step's run logs for run `27311...`-era Contribution Check; recurring → investigate workflow logic, not a one-off
P1	Issue Monster	Recurring step failures (`gh-aw.run.status:failure`)	3 distinct failed traces, 12 failure spans; top failing `gen_ai` span 170,528 ms (~2.8 min), run `27275153807`, trace `f7bcadc4...`	Triage the long failing step in run `27275153807`; confirm whether failures are deterministic
P3	(cross-cutting)	`span.status` absent in Sentry; `gen_ai.response.finish_reasons` not queryable	`has:span.status` → 0 results; `has:gen_ai.response.finish_reasons` → 0 results, although emit-side always sets them	Add a queryable scalar (e.g. `gen_ai.response.finish_reason` string) and verify OTLP `status.code`→Sentry mapping
P3	(datasets)	`errors` and `logs` datasets empty for 24h	`list_events` on both datasets → "No results found"	Confirm whether error/log export is intended; if so, no action — otherwise wire up error/log signals
P4	Daily Agent of the Day Blog Writer	Single very long `gen_ai` span (latency outlier)	`gen_ai` span 2,806,011 ms (~46.8 min), trace `223d8593...` (no `run.status` captured on the span)	Monitor only — single outlier, not yet systemic; confirm against finish-reason once queryable

Uh oh!

[reliability] Daily Reliability Review - 2026-06-10 #38460

Description

Executive Summary

Top Reliability Findings

Representative Traces

Recommendations

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions