&#9889; Claude Token Optimization2026-07-03 &mdash; smoke-claude

## Target Workflow: `smoke-claude`

**Source:** Local run data (17 runs, Jul 2026)
**Estimated cost per run:** $0.49 (range: $0.37&ndash;$0.57)
**Total tokens per run:** ~350K effective input + ~850 output
**Cache read rate:** 85.3% (excellent)
**Cache write rate:** 13.7% (bimodal: 7.5% warm / 17.1% cold)
**LLM turns:** 1 (single-shot, already optimal)

## &#9888;&#65039; Critical Finding: Model Aliasing Mismatch

The workflow specifies `model: claude-haiku-4-5` in its frontmatter, but all 16 measured runs actually executed on **`claude-opus-4-8`** &mdash; which is **5&times; more expensive**.

| Model | Input | Output | Cache Read | Cache Write |
|-------|------:|-------:|-----------:|------------:|
| `claude-haiku-4-5` (intended) | $1/M | $5/M | $0.10/M | $1.25/M |
| `claude-opus-4-8` (actual) | $5/M | $25/M | $0.50/M | $6.25/M |

This is likely because `claude-haiku-4-5` is being aliased/upgraded by the Anthropic API to `claude-opus-4-8` at runtime.

## Current Configuration

| Setting | Value |
|---------|-------|
| Configured model | `claude-haiku-4-5` (frontmatter) |
| Actual runtime model | `claude-opus-4-8` (all 16 runs) |
| Tools loaded | `bash` only (`github: false`) |
| Tools actually used | 1: `bash` (single `cat` call) |
| Network groups | `defaults` |
| Pre-agent steps | &#9989; Yes (5 steps pre-compute all work) |
| Prompt size | 7,092 chars (~1,750 tokens) |
| LLM turns | 1 |

## Recommendations

### 1. &#128308; Fix Model Aliasing &mdash; Use `claude-haiku-4-5-20251001` (Pinned Date)

**Estimated savings: ~$0.39/run (~80%), ~$712/year**

The unversioned `claude-haiku-4-5` alias is being mapped to `claude-opus-4-8` by the Anthropic API. Pin to the explicit dated version or use the version specifier to ensure the intended haiku model is used.

**Change in `smoke-claude.md` frontmatter:**
```yaml
# Current (broken alias):
engine:
  id: claude
  model: claude-haiku-4-5

# Option A &mdash; Pin to dated haiku version:
engine:
  id: claude
  model: claude-haiku-4-5-20251001

# Option B &mdash; Use claude-3-5-haiku (stable alias):
engine:
  id: claude
  model: claude-3-5-haiku-20241022
```

After changing, recompile:
```bash
gh aw compile .github/workflows/smoke-claude.md
npx tsx scripts/ci/postprocess-smoke-workflows.ts
```

### 2. &#128993; Stabilize Cache Write Behavior &mdash; Investigate Bimodal Pattern

**Estimated savings: ~$0.20/run (~41%) when cache expires**

Cache writes show a bimodal distribution:
- **Warm cache** (7.5% write rate): 26,322&ndash;26,945 tokens written &rarr; AIC ~35
- **Cold cache** (17.1% write rate): 61,769&ndash;61,818 tokens written &rarr; AIC ~55

The 12-hour schedule (`22 */12 * * *`) means scheduled runs fire at 22:22 and 10:22 UTC. Anthropic's prompt cache TTL is ~5 minutes for standard caching. Cache is only warm if a _previous PR-triggered run_ ran within 5 minutes.

The 35K extra tokens written on cold runs ($0.22 extra cost) suggest there are two distinct cache segments &mdash; one smaller one (~26K) that represents the truly static system context, and another ~35K segment that represents content written once and expected to be cached from a prior run.

**Investigation steps:**
1. Check whether the 26K vs 61K split correlates with PR trigger vs scheduled trigger
2. If PR-trigger runs always write 61K (because they fire cold), consider whether the scheduled run frequency could be reduced

### 3. &#128994; Verify Haiku Task Capability (No Regression Risk)

**Estimated savings: confirms $0.39/run savings are safe to claim**

The task is deliberately simple by design:
1. Read pre-computed `/tmp/gh-aw/agent/final-result.json` (one bash call)
2. Call `add_comment` or `noop` based on result

This is well within `claude-haiku-4-5` / `claude-3-5-haiku` capability. The single LLM turn, minimal tool use, and pre-computed result structure confirm no capability upgrade is needed.

**Verification:** Run one PR with the pinned haiku model and confirm:
- The `add_comment` safe output is called with correct data
- The `verify_token_usage` post-job passes
- `primary_model` in `agent_usage.json` matches the intended haiku variant

## Cache Analysis (Anthropic-Specific)

This is a **single-turn workflow** &mdash; all LLM interaction happens in one request.

| Scenario | Input | Output | Cache Read | Cache Write | Cost |
|----------|------:|-------:|-----------:|------------:|------:|
| Warm cache | 3,300 | 854 | 330,971 | 26,945 | $0.37 |
| Cold cache | 3,300 | 854 | 295,903 | 61,769 | $0.57 |
| Average | 3,293 | 854 | 298,768 | 48,079 | $0.49 |

**Cache write amortization:** With a 5-minute cache TTL, the 26K&ndash;61K tokens written per run are **not reused across runs** (runs are hours apart). The cache reads (298K&ndash;330K tokens) come from Anthropic's own infrastructure caching the tool schemas and system context across runs within the TTL window.

**Cache cost vs benefit:**
- Cache writes cost $6.25/M tokens (opus-4-8 rate)
- Cache reads cost $0.50/M tokens  
- For the 61K cold-write case: write cost = $0.386, read savings vs uncached = ~$1.65 (if same tokens were re-read uncached at $5/M)
- At the **haiku-4-5 rate** ($1.25/M write, $0.10/M read), this same 61K write costs only $0.076 &mdash; making the caching economics dramatically better

## Expected Impact

| Metric | Current | Projected (after fix) | Savings |
|--------|---------|----------------------|---------|
| Total tokens/run | 350K eff. input | 350K eff. input | 0% |
| Cost/run | $0.49 | $0.10 | -80% |
| AIC/run | 47.1 | ~9.4 (est.) | -80% |
| LLM turns | 1 | 1 | 0 |
| Session time | ~5.5m | ~5.5m | ~0% |
| Annual cost | ~$890 | ~$178 | -$712 |

## Implementation Checklist

- [ ] Confirm which haiku dated alias resolves correctly: `claude-haiku-4-5-20251001` or `claude-3-5-haiku-20241022`
- [ ] Update `model:` in `.github/workflows/smoke-claude.md` to the pinned haiku alias
- [ ] Recompile: `gh aw compile .github/workflows/smoke-claude.md`
- [ ] Post-process: `npx tsx scripts/ci/postprocess-smoke-workflows.ts`
- [ ] Verify CI passes and `agent_usage.json` shows the intended haiku model
- [ ] Confirm `verify_token_usage` job passes (token budget check)
- [ ] Compare AIC on new run vs baseline (expect ~47 &rarr; ~9)
- [ ] Investigate bimodal cache write pattern (PR vs scheduled trigger correlation)




> [!WARNING]
> <details>
> <summary>Firewall blocked 1 domain</summary>
>
> The following domain was blocked by the firewall during workflow execution:
>
> - `awmgmcpg`
>> To allow these domains, add them to the `network.allowed` list in your workflow frontmatter:
>
> ```yaml
> network:
>   allowed:
>     - defaults
>     - "awmgmcpg"
> ```
>
> See [Network Configuration](https://github.github.com/gh-aw/reference/network/) for more information.
>
> </details>


> Generated by [Daily Claude Token Optimization Advisor](https://github.com/github/gh-aw-firewall/actions/runs/28646042958) &middot; 150.5 AIC &middot; &#8862; 6.3K &middot; [&#9719;](https://github.com/search?q=repo%3Agithub%2Fgh-aw-firewall+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw-firewall%2Fclaude-token-optimizer%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡ Claude Token Optimization2026-07-03 — smoke-claude #5864

Target Workflow: `smoke-claude`

⚠️ Critical Finding: Model Aliasing Mismatch

Current Configuration

Recommendations

1. 🔴 Fix Model Aliasing — Use `claude-haiku-4-5-20251001` (Pinned Date)

2. 🟡 Stabilize Cache Write Behavior — Investigate Bimodal Pattern

3. 🟢 Verify Haiku Task Capability (No Regression Risk)

Cache Analysis (Anthropic-Specific)

Expected Impact

Implementation Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Input	Output	Cache Read	Cache Write
`claude-haiku-4-5` (intended)	$1/M	$5/M	$0.10/M	$1.25/M
`claude-opus-4-8` (actual)	$5/M	$25/M	$0.50/M	$6.25/M

Setting	Value
Configured model	`claude-haiku-4-5` (frontmatter)
Actual runtime model	`claude-opus-4-8` (all 16 runs)
Tools loaded	`bash` only (`github: false`)
Tools actually used	1: `bash` (single `cat` call)
Network groups	`defaults`
Pre-agent steps	✅ Yes (5 steps pre-compute all work)
Prompt size	7,092 chars (~1,750 tokens)
LLM turns	1

Scenario	Input	Output	Cache Read	Cache Write	Cost
Warm cache	3,300	854	330,971	26,945	$0.37
Cold cache	3,300	854	295,903	61,769	$0.57
Average	3,293	854	298,768	48,079	$0.49

Metric	Current	Projected (after fix)	Savings
Total tokens/run	350K eff. input	350K eff. input	0%
Cost/run	$0.49	$0.10	-80%
AIC/run	47.1	~9.4 (est.)	-80%
LLM turns	1	1	0
Session time	~5.5m	~5.5m	~0%
Annual cost	~$890	~$178	-$712

Uh oh!

⚡ Claude Token Optimization2026-07-03 — smoke-claude #5864

Description

Target Workflow: smoke-claude

⚠️ Critical Finding: Model Aliasing Mismatch

Current Configuration

Recommendations

1. 🔴 Fix Model Aliasing — Use claude-haiku-4-5-20251001 (Pinned Date)

2. 🟡 Stabilize Cache Write Behavior — Investigate Bimodal Pattern

3. 🟢 Verify Haiku Task Capability (No Regression Risk)

Cache Analysis (Anthropic-Specific)

Expected Impact

Implementation Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Target Workflow: `smoke-claude`

1. 🔴 Fix Model Aliasing — Use `claude-haiku-4-5-20251001` (Pinned Date)