feat: strengthen agentic audit baselines and execution observability by mnkiefer · Pull Request #22711 · github/gh-aw

mnkiefer · 2026-03-24T14:51:47Z

This PR adds baseline-aware auditing, policy-driven observability checks, and ready-to-use monitoring workflows so teams can detect risky agentic behavior earlier and operationalize it across repositories.
This is important since with agentic workflows, we need to be looking at behavior not just outcomes: a workflow can keep passing while becoming more expensive, less controlled, or less predictable. Standard GitHub Actions status does not tell you that.
Logs: better gh aw logs reporting across multiple runs
- Shows recurring patterns and trends instead of only one-run summaries
- Surfaces observability insights so users can spot risky, noisy, resource-heavy, or weakly controlled behavior
- Gives a more operational view of workflow behavior over time
- Makes it easier to monitor many executions and identify patterns worth follow-up
Audits: smarter gh aw audit output that explains what happened in a run
- Compares a run against a similar successful baseline
- Shows meaningful changes like blocked requests, MCP failures, and execution posture shifts
- Adds task-domain detection to help interpret whether the workflow behavior fits the task
- Adds behavioral fingerprinting to show the overall execution style of a run
- Adds agentic assessments to show whether a workflow looks lean, risky, overbuilt, or weakly controlled
- Adds key findings so the most important conclusions are visible immediately
- Adds recommendations so users know what to investigate or change next
- Adds performance metrics so users can see whether runs are getting slower, costlier, or heavier
- Gives users a faster way to debug regressions and understand why a run is concerning

Agentic Observability Kit

Adds a drop-in observability workflow kit users can adopt without building monitoring from scratch. We also have a central-reporting variant of this for teams managing many repositories.
This gives teams an operational workflow for turning observability into ongoing reporting and follow-up and makes it easier to standardize monitoring across repos instead of inventing a custom setup each time.

Copilot

Pull request overview

Adds richer observability and auditing capabilities for agentic workflow runs, including baseline comparison, execution-path insights, a policy evaluation CLI, and an opt-in GitHub Actions job summary section.

Changes:

Introduces new JSON schemas for observability reports and observability guardrail policies.
Adds baseline comparison to the audit report (compare against last successful run) and surfaces observability insights in audit/logs output.
Adds an opt-in workflow compilation step to generate an “Observability” GitHub Actions job summary section, plus a new observability-policy CLI command for policy evaluation.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
schemas/observability-report.json	New schema describing run/fleet observability payloads (lineage, execution, reasoning, tooling/network/actuation, insights/findings).
schemas/observability-policy.json	New schema for policy rules evaluated against observability payloads.
pkg/workflow/observability_job_summary_test.go	Tests ensuring an opt-in job-summary step is injected (and absent by default).
pkg/workflow/frontmatter_types_test.go	Adds frontmatter parsing coverage for `observability.job-summary`.
pkg/workflow/frontmatter_types.go	Adds `ObservabilityConfig` and exposes it via `FrontmatterConfig`.
pkg/workflow/compiler_yaml_main_job.go	Hooks observability summary generation into main job step synthesis.
pkg/workflow/compiler_yaml_ai_execution.go	Implements opt-in observability summary step and mode extraction.
pkg/parser/schemas/main_workflow_schema.json	Extends workflow frontmatter schema to allow `observability.job-summary` on/off.
pkg/parser/schema_test.go	Validates schema acceptance of `observability.job-summary`.
pkg/cli/observability_schema_test.go	Verifies the observability report schema includes key sections/defs.
pkg/cli/observability_policy_test.go	Adds unit tests for policy evaluation + schema parsing.
pkg/cli/observability_policy_command_test.go	Tests CLI evaluation output, JSON emission, and blocking behavior.
pkg/cli/observability_policy_command.go	Implements new `observability-policy eval` command and rendering/error behavior.
pkg/cli/observability_policy.go	Adds policy/payload types and matching logic for rule evaluation.
pkg/cli/observability_insights_test.go	Adds tests for audit/logs observability insight generation.
pkg/cli/observability_insights.go	Implements observability insight generation + console rendering helper.
pkg/cli/mcp_schema_test.go	Ensures schema generation works for `ObservabilityPolicy`.
pkg/cli/logs_report.go	Adds aggregated observability insights to logs data and console rendering.
pkg/cli/audit_test.go	Updates audit tests to account for new `Comparison` field.
pkg/cli/audit_report_render.go	Renders new “Comparison” and “Observability Insights” sections in audit output.
pkg/cli/audit_report.go	Adds `Comparison` + `ObservabilityInsights` to audit JSON and populates insights.
pkg/cli/audit_comparison_test.go	Adds tests for comparison behavior (no baseline / risky / stable).
pkg/cli/audit_comparison.go	Implements baseline lookup, snapshot extraction, delta/classification, and recommendations.
pkg/cli/audit.go	Computes current snapshot + attaches baseline comparison into audit output.
cmd/gh-aw/main.go	Registers the new `observability-policy` command under the analysis group.
actions/setup/js/generate_observability_summary.test.cjs	Adds vitest coverage for summary generation and opt-out behavior.
actions/setup/js/generate_observability_summary.cjs	Implements job summary synthesis from runtime artifacts (aw_info, agent_output, gateway logs).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

actions/setup/js/generate_observability_summary.cjs

pkg/cli/audit_comparison.go

pkg/cli/observability_insights.go

pkg/cli/observability_policy.go

pkg/cli/observability_policy_command.go

pelikhan · 2026-03-24T17:54:25Z

@copilot review comments, lint go

Co-authored-by: pelikhan <[email protected]> Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3c9ee3d6-0a27-4f53-810e-025fc6e26234

Copilot · 2026-03-24T18:06:06Z

@copilot review comments, lint go

Addressed all review comments in commit 5abdb23:

countBlockedRequests() — now sums across all gatewayEventPaths files instead of returning after the first match.
Audit comparison classification — added reason codes for beneficial changes (turns_decrease, blocked_requests_decrease, mcp_failures_resolved) and explicit switch cases so runs that differ from baseline are no longer classified as stable.
renderObservabilityInsights — added critical and low severity icons.
min_blocked_requests evidence — now includes both threshold and actual count, e.g. blocked_requests_gte=5 actual=7.
warn action — added explicit case "warn" using FormatWarningMessage so it renders distinctly from info messages.

…ngerprint

Improve agentic audit baselines and execution observability

d01144d

Copilot AI review requested due to automatic review settings March 24, 2026 14:51

Copilot started reviewing on behalf of mnkiefer March 24, 2026 14:52 View session

Merge branch 'main' into obs-tools

a8b86d6

Copilot AI reviewed Mar 24, 2026

View reviewed changes

mnkiefer self-assigned this Mar 24, 2026

Merge branch 'main' into obs-tools

80942d2

github-actions bot mentioned this pull request Mar 24, 2026

[Contribution Check Report] PR #22711 — feat: improve agentic audit baselines and execution observability #22728

Open

mnkiefer marked this pull request as draft March 24, 2026 17:50

Merge branch 'main' into obs-tools

b22b7cd

Copilot started work on behalf of pelikhan March 24, 2026 17:54 View session

fix: address review comments for observability audit improvements

5abdb23

Co-authored-by: pelikhan <[email protected]> Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3c9ee3d6-0a27-4f53-810e-025fc6e26234

Copilot AI requested a review from pelikhan March 24, 2026 18:06

Copilot finished work on behalf of pelikhan March 24, 2026 18:07

mnkiefer and others added 3 commits March 24, 2026 19:55

enhance audit comparison and reporting with task domain & behavior fi…

d32e410

…ngerprint

simplify string checks and error handling

89f4ad9

Merge branch 'main' into obs-tools

1b1b538

mnkiefer changed the title ~~feat: improve agentic audit baselines and execution observability~~ feat: strengthen agentic audit baselines and execution observability Mar 24, 2026

pelikhan marked this pull request as ready for review March 24, 2026 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: strengthen agentic audit baselines and execution observability#22711

feat: strengthen agentic audit baselines and execution observability#22711
mnkiefer wants to merge 8 commits intomainfrom
obs-tools

mnkiefer commented Mar 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pelikhan commented Mar 24, 2026

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mnkiefer commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agentic Observability Kit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pelikhan commented Mar 24, 2026

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mnkiefer commented Mar 24, 2026 •

edited

Loading