Skip to content

⚡ Claude Token Optimization2026-07-03 — smoke-claude #5864

Description

@github-actions

Target Workflow: smoke-claude

Source: Local run data (17 runs, Jul 2026)
Estimated cost per run: $0.49 (range: $0.37–$0.57)
Total tokens per run: ~350K effective input + ~850 output
Cache read rate: 85.3% (excellent)
Cache write rate: 13.7% (bimodal: 7.5% warm / 17.1% cold)
LLM turns: 1 (single-shot, already optimal)

⚠️ Critical Finding: Model Aliasing Mismatch

The workflow specifies model: claude-haiku-4-5 in its frontmatter, but all 16 measured runs actually executed on claude-opus-4-8 — which is 5× more expensive.

Model Input Output Cache Read Cache Write
claude-haiku-4-5 (intended) $1/M $5/M $0.10/M $1.25/M
claude-opus-4-8 (actual) $5/M $25/M $0.50/M $6.25/M

This is likely because claude-haiku-4-5 is being aliased/upgraded by the Anthropic API to claude-opus-4-8 at runtime.

Current Configuration

Setting Value
Configured model claude-haiku-4-5 (frontmatter)
Actual runtime model claude-opus-4-8 (all 16 runs)
Tools loaded bash only (github: false)
Tools actually used 1: bash (single cat call)
Network groups defaults
Pre-agent steps ✅ Yes (5 steps pre-compute all work)
Prompt size 7,092 chars (~1,750 tokens)
LLM turns 1

Recommendations

1. 🔴 Fix Model Aliasing — Use claude-haiku-4-5-20251001 (Pinned Date)

Estimated savings: ~$0.39/run (~80%), ~$712/year

The unversioned claude-haiku-4-5 alias is being mapped to claude-opus-4-8 by the Anthropic API. Pin to the explicit dated version or use the version specifier to ensure the intended haiku model is used.

Change in smoke-claude.md frontmatter:

# Current (broken alias):
engine:
  id: claude
  model: claude-haiku-4-5

# Option A — Pin to dated haiku version:
engine:
  id: claude
  model: claude-haiku-4-5-20251001

# Option B — Use claude-3-5-haiku (stable alias):
engine:
  id: claude
  model: claude-3-5-haiku-20241022

After changing, recompile:

gh aw compile .github/workflows/smoke-claude.md
npx tsx scripts/ci/postprocess-smoke-workflows.ts

2. 🟡 Stabilize Cache Write Behavior — Investigate Bimodal Pattern

Estimated savings: ~$0.20/run (~41%) when cache expires

Cache writes show a bimodal distribution:

  • Warm cache (7.5% write rate): 26,322–26,945 tokens written → AIC ~35
  • Cold cache (17.1% write rate): 61,769–61,818 tokens written → AIC ~55

The 12-hour schedule (22 */12 * * *) means scheduled runs fire at 22:22 and 10:22 UTC. Anthropic's prompt cache TTL is ~5 minutes for standard caching. Cache is only warm if a previous PR-triggered run ran within 5 minutes.

The 35K extra tokens written on cold runs ($0.22 extra cost) suggest there are two distinct cache segments — one smaller one (~26K) that represents the truly static system context, and another ~35K segment that represents content written once and expected to be cached from a prior run.

Investigation steps:

  1. Check whether the 26K vs 61K split correlates with PR trigger vs scheduled trigger
  2. If PR-trigger runs always write 61K (because they fire cold), consider whether the scheduled run frequency could be reduced

3. 🟢 Verify Haiku Task Capability (No Regression Risk)

Estimated savings: confirms $0.39/run savings are safe to claim

The task is deliberately simple by design:

  1. Read pre-computed /tmp/gh-aw/agent/final-result.json (one bash call)
  2. Call add_comment or noop based on result

This is well within claude-haiku-4-5 / claude-3-5-haiku capability. The single LLM turn, minimal tool use, and pre-computed result structure confirm no capability upgrade is needed.

Verification: Run one PR with the pinned haiku model and confirm:

  • The add_comment safe output is called with correct data
  • The verify_token_usage post-job passes
  • primary_model in agent_usage.json matches the intended haiku variant

Cache Analysis (Anthropic-Specific)

This is a single-turn workflow — all LLM interaction happens in one request.

Scenario Input Output Cache Read Cache Write Cost
Warm cache 3,300 854 330,971 26,945 $0.37
Cold cache 3,300 854 295,903 61,769 $0.57
Average 3,293 854 298,768 48,079 $0.49

Cache write amortization: With a 5-minute cache TTL, the 26K–61K tokens written per run are not reused across runs (runs are hours apart). The cache reads (298K–330K tokens) come from Anthropic's own infrastructure caching the tool schemas and system context across runs within the TTL window.

Cache cost vs benefit:

  • Cache writes cost $6.25/M tokens (opus-4-8 rate)
  • Cache reads cost $0.50/M tokens
  • For the 61K cold-write case: write cost = $0.386, read savings vs uncached = ~$1.65 (if same tokens were re-read uncached at $5/M)
  • At the haiku-4-5 rate ($1.25/M write, $0.10/M read), this same 61K write costs only $0.076 — making the caching economics dramatically better

Expected Impact

Metric Current Projected (after fix) Savings
Total tokens/run 350K eff. input 350K eff. input 0%
Cost/run $0.49 $0.10 -80%
AIC/run 47.1 ~9.4 (est.) -80%
LLM turns 1 1 0
Session time ~5.5m ~5.5m ~0%
Annual cost ~$890 ~$178 -$712

Implementation Checklist

  • Confirm which haiku dated alias resolves correctly: claude-haiku-4-5-20251001 or claude-3-5-haiku-20241022
  • Update model: in .github/workflows/smoke-claude.md to the pinned haiku alias
  • Recompile: gh aw compile .github/workflows/smoke-claude.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Verify CI passes and agent_usage.json shows the intended haiku model
  • Confirm verify_token_usage job passes (token budget check)
  • Compare AIC on new run vs baseline (expect ~47 → ~9)
  • Investigate bimodal cache write pattern (PR vs scheduled trigger correlation)

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • awmgmcpg

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "awmgmcpg"

See Network Configuration for more information.

Generated by Daily Claude Token Optimization Advisor · 150.5 AIC · ⊞ 6.3K ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions