You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Source: Local run data (17 runs, Jul 2026) Estimated cost per run: $0.49 (range: $0.37–$0.57) Total tokens per run: ~350K effective input + ~850 output Cache read rate: 85.3% (excellent) Cache write rate: 13.7% (bimodal: 7.5% warm / 17.1% cold) LLM turns: 1 (single-shot, already optimal)
⚠️ Critical Finding: Model Aliasing Mismatch
The workflow specifies model: claude-haiku-4-5 in its frontmatter, but all 16 measured runs actually executed on claude-opus-4-8 — which is 5× more expensive.
Model
Input
Output
Cache Read
Cache Write
claude-haiku-4-5 (intended)
$1/M
$5/M
$0.10/M
$1.25/M
claude-opus-4-8 (actual)
$5/M
$25/M
$0.50/M
$6.25/M
This is likely because claude-haiku-4-5 is being aliased/upgraded by the Anthropic API to claude-opus-4-8 at runtime.
Current Configuration
Setting
Value
Configured model
claude-haiku-4-5 (frontmatter)
Actual runtime model
claude-opus-4-8 (all 16 runs)
Tools loaded
bash only (github: false)
Tools actually used
1: bash (single cat call)
Network groups
defaults
Pre-agent steps
✅ Yes (5 steps pre-compute all work)
Prompt size
7,092 chars (~1,750 tokens)
LLM turns
1
Recommendations
1. 🔴 Fix Model Aliasing — Use claude-haiku-4-5-20251001 (Pinned Date)
Estimated savings: ~$0.39/run (~80%), ~$712/year
The unversioned claude-haiku-4-5 alias is being mapped to claude-opus-4-8 by the Anthropic API. Pin to the explicit dated version or use the version specifier to ensure the intended haiku model is used.
Change in smoke-claude.md frontmatter:
# Current (broken alias):engine:
id: claudemodel: claude-haiku-4-5# Option A — Pin to dated haiku version:engine:
id: claudemodel: claude-haiku-4-5-20251001# Option B — Use claude-3-5-haiku (stable alias):engine:
id: claudemodel: claude-3-5-haiku-20241022
After changing, recompile:
gh aw compile .github/workflows/smoke-claude.md
npx tsx scripts/ci/postprocess-smoke-workflows.ts
2. 🟡 Stabilize Cache Write Behavior — Investigate Bimodal Pattern
Estimated savings: ~$0.20/run (~41%) when cache expires
The 12-hour schedule (22 */12 * * *) means scheduled runs fire at 22:22 and 10:22 UTC. Anthropic's prompt cache TTL is ~5 minutes for standard caching. Cache is only warm if a previous PR-triggered run ran within 5 minutes.
The 35K extra tokens written on cold runs ($0.22 extra cost) suggest there are two distinct cache segments — one smaller one (~26K) that represents the truly static system context, and another ~35K segment that represents content written once and expected to be cached from a prior run.
Investigation steps:
Check whether the 26K vs 61K split correlates with PR trigger vs scheduled trigger
If PR-trigger runs always write 61K (because they fire cold), consider whether the scheduled run frequency could be reduced
3. 🟢 Verify Haiku Task Capability (No Regression Risk)
Estimated savings: confirms $0.39/run savings are safe to claim
This is well within claude-haiku-4-5 / claude-3-5-haiku capability. The single LLM turn, minimal tool use, and pre-computed result structure confirm no capability upgrade is needed.
Verification: Run one PR with the pinned haiku model and confirm:
The add_comment safe output is called with correct data
The verify_token_usage post-job passes
primary_model in agent_usage.json matches the intended haiku variant
Cache Analysis (Anthropic-Specific)
This is a single-turn workflow — all LLM interaction happens in one request.
Scenario
Input
Output
Cache Read
Cache Write
Cost
Warm cache
3,300
854
330,971
26,945
$0.37
Cold cache
3,300
854
295,903
61,769
$0.57
Average
3,293
854
298,768
48,079
$0.49
Cache write amortization: With a 5-minute cache TTL, the 26K–61K tokens written per run are not reused across runs (runs are hours apart). The cache reads (298K–330K tokens) come from Anthropic's own infrastructure caching the tool schemas and system context across runs within the TTL window.
Cache cost vs benefit:
Cache writes cost $6.25/M tokens (opus-4-8 rate)
Cache reads cost $0.50/M tokens
For the 61K cold-write case: write cost = $0.386, read savings vs uncached = ~$1.65 (if same tokens were re-read uncached at $5/M)
At the haiku-4-5 rate ($1.25/M write, $0.10/M read), this same 61K write costs only $0.076 — making the caching economics dramatically better
Expected Impact
Metric
Current
Projected (after fix)
Savings
Total tokens/run
350K eff. input
350K eff. input
0%
Cost/run
$0.49
$0.10
-80%
AIC/run
47.1
~9.4 (est.)
-80%
LLM turns
1
1
0
Session time
~5.5m
~5.5m
~0%
Annual cost
~$890
~$178
-$712
Implementation Checklist
Confirm which haiku dated alias resolves correctly: claude-haiku-4-5-20251001 or claude-3-5-haiku-20241022
Update model: in .github/workflows/smoke-claude.md to the pinned haiku alias
Recompile: gh aw compile .github/workflows/smoke-claude.md
Target Workflow:
smoke-claudeSource: Local run data (17 runs, Jul 2026)
Estimated cost per run: $0.49 (range: $0.37–$0.57)
Total tokens per run: ~350K effective input + ~850 output
Cache read rate: 85.3% (excellent)
Cache write rate: 13.7% (bimodal: 7.5% warm / 17.1% cold)
LLM turns: 1 (single-shot, already optimal)
The workflow specifies
model: claude-haiku-4-5in its frontmatter, but all 16 measured runs actually executed onclaude-opus-4-8— which is 5× more expensive.claude-haiku-4-5(intended)claude-opus-4-8(actual)This is likely because
claude-haiku-4-5is being aliased/upgraded by the Anthropic API toclaude-opus-4-8at runtime.Current Configuration
claude-haiku-4-5(frontmatter)claude-opus-4-8(all 16 runs)bashonly (github: false)bash(singlecatcall)defaultsRecommendations
1. 🔴 Fix Model Aliasing — Use
claude-haiku-4-5-20251001(Pinned Date)Estimated savings: ~$0.39/run (~80%), ~$712/year
The unversioned
claude-haiku-4-5alias is being mapped toclaude-opus-4-8by the Anthropic API. Pin to the explicit dated version or use the version specifier to ensure the intended haiku model is used.Change in
smoke-claude.mdfrontmatter:After changing, recompile:
2. 🟡 Stabilize Cache Write Behavior — Investigate Bimodal Pattern
Estimated savings: ~$0.20/run (~41%) when cache expires
Cache writes show a bimodal distribution:
The 12-hour schedule (
22 */12 * * *) means scheduled runs fire at 22:22 and 10:22 UTC. Anthropic's prompt cache TTL is ~5 minutes for standard caching. Cache is only warm if a previous PR-triggered run ran within 5 minutes.The 35K extra tokens written on cold runs ($0.22 extra cost) suggest there are two distinct cache segments — one smaller one (~26K) that represents the truly static system context, and another ~35K segment that represents content written once and expected to be cached from a prior run.
Investigation steps:
3. 🟢 Verify Haiku Task Capability (No Regression Risk)
Estimated savings: confirms $0.39/run savings are safe to claim
The task is deliberately simple by design:
/tmp/gh-aw/agent/final-result.json(one bash call)add_commentornoopbased on resultThis is well within
claude-haiku-4-5/claude-3-5-haikucapability. The single LLM turn, minimal tool use, and pre-computed result structure confirm no capability upgrade is needed.Verification: Run one PR with the pinned haiku model and confirm:
add_commentsafe output is called with correct dataverify_token_usagepost-job passesprimary_modelinagent_usage.jsonmatches the intended haiku variantCache Analysis (Anthropic-Specific)
This is a single-turn workflow — all LLM interaction happens in one request.
Cache write amortization: With a 5-minute cache TTL, the 26K–61K tokens written per run are not reused across runs (runs are hours apart). The cache reads (298K–330K tokens) come from Anthropic's own infrastructure caching the tool schemas and system context across runs within the TTL window.
Cache cost vs benefit:
Expected Impact
Implementation Checklist
claude-haiku-4-5-20251001orclaude-3-5-haiku-20241022model:in.github/workflows/smoke-claude.mdto the pinned haiku aliasgh aw compile .github/workflows/smoke-claude.mdnpx tsx scripts/ci/postprocess-smoke-workflows.tsagent_usage.jsonshows the intended haiku modelverify_token_usagejob passes (token budget check)Warning
Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
awmgmcpgSee Network Configuration for more information.