tools

Agent Validation Tools

Automated testing and validation for agent preambles using LangChain + GitHub Copilot API.

🚀 Quick Start

# 1. Setup (10 minutes)
gh auth login
pip install langchain-github-copilot langchain-core
npm install @langchain/core @langchain/community langchain

# 2. Verify
python3 -c "from langchain_github_copilot import ChatGitHubCopilot; llm = ChatGitHubCopilot(); print('✅', llm.invoke('Hi').content)"

# 3. Build validation tool
# See VALIDATION_TOOL_DESIGN.md for implementation code

📚 Documentation

SETUP.md - 10-minute setup guide
VALIDATION_TOOL_DESIGN.md - Full implementation with code
VALIDATION_SUMMARY.md - Overview and architecture

🎯 What This Does

Automatically test agent preambles by:

Loading agent preamble as system prompt
Executing benchmark task via GitHub Copilot API
Capturing output and conversation history
Scoring against rubric using LLM-as-judge
Generating detailed reports (JSON + Markdown)

🏗️ Architecture

TypeScript Tool → Python Bridge → GitHub Copilot API
                                   (GPT-4 + Claude)

Why GitHub Copilot?

✅ Uses existing subscription (no new costs)
✅ High quality (GPT-4 + Claude models)
✅ Simple setup (just authenticate)
✅ Fast (cloud inference)

📦 Files to Create

tools/
├── llm-client.ts              # Copilot client (TypeScript → Python)
├── validate-agent.ts          # Main validation script
├── evaluators/
│   └── index.ts               # LLM-as-judge evaluators
└── report-generator.ts        # Report formatting

Full code provided in VALIDATION_TOOL_DESIGN.md.

🎯 Usage Examples

Validate Single Agent

npm run validate docs/agents/claudette-debug.md benchmarks/debug-benchmark.json

Test Agentinator (Two-Hop)

npm run validate:agentinator -- \
  --agentinator docs/agents/claudette-agentinator.md \
  --requirement "Design debug agent" \
  --benchmark benchmarks/debug-benchmark.json \
  --baseline 92

📊 Output

Terminal

🔍 Validating agent: claudette-debug.md
⚙️  Executing benchmark task...
✅ Task completed in 12,451 tokens
📊 Evaluating output against rubric...
📈 Total score: 92/100
📄 Report saved to: validation-output/2025-10-15_claudette-debug.md

Files Generated

validation-output/
├── 2025-10-15_claudette-debug.json    # Raw data
└── 2025-10-15_claudette-debug.md      # Readable report

⏱️ Timeline

Phase	Task	Time
Setup	Authenticate + install	10 min
Implement	Create tool files	4 hours
Benchmarks	Define tasks + rubrics	1 hour
Test	First validation	30 min
Total	Working system	5.5 hours

🔧 Requirements

Node.js 18+ (for TypeScript tool)
Python 3.8+ (for Copilot integration)
GitHub Copilot subscription (already have)
GitHub CLI (gh) for authentication

🚀 Next Steps

Setup (10 min): Run commands in SETUP.md
Implement (4 hours): Copy code from VALIDATION_TOOL_DESIGN.md
Test (30 min): Validate claudette-debug.md baseline
Iterate (ongoing): Test Agentinator-generated agents

📖 See Also

docs/agents/AGENTIC_PROMPTING_FRAMEWORK.md - Principles for agent design
docs/agents/claudette-agentinator.md - Meta-agent that builds agents
docs/agents/claudette-debug.md - Gold standard debug agent (92/100)
benchmarks/RESEARCH_AGENT_BENCHMARK.md - Benchmark example

Status: Design complete, ready for implementation.

Name		Name	Last commit message	Last commit date
parent directory ..
MCP_CURSOR_TEST.md		MCP_CURSOR_TEST.md
README.md		README.md
connect-mcp-docker.sh		connect-mcp-docker.sh
flatten_for_mcp.py		flatten_for_mcp.py
test-mcp-http.sh		test-mcp-http.sh
test-neo4j.js		test-neo4j.js
test-neo4j.ts		test-neo4j.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Agent Validation Tools

🚀 Quick Start

📚 Documentation

🎯 What This Does

🏗️ Architecture

📦 Files to Create

🎯 Usage Examples

Validate Single Agent

Test Agentinator (Two-Hop)

📊 Output

Terminal

Files Generated

⏱️ Timeline

🔧 Requirements

🚀 Next Steps

📖 See Also

FilesExpand file tree

tools

Directory actions

More options

Directory actions

More options

Latest commit

History

tools

Folders and files

parent directory

README.md

Agent Validation Tools

🚀 Quick Start

📚 Documentation

🎯 What This Does

🏗️ Architecture

📦 Files to Create

🎯 Usage Examples

Validate Single Agent

Test Agentinator (Two-Hop)

📊 Output

Terminal

Files Generated

⏱️ Timeline

🔧 Requirements

🚀 Next Steps

📖 See Also