Try Demo 🤗HuggingFace | 🚀ModelScope
Join us on 🎮Discord | 💬WeChat
If you like WebQA Agent, please give us a ⭐ on GitHub!
🤖 WebQA Agent is a fully automated web testing agent for multi-modal understanding, test generation, and end-to-end evaluation of functionality, performance, and UX. ✨
- Core Features
- Examples
- Quick Start
- Usage
- Extending WebQA Agent Tools
- RoadMap
- Acknowledgements
- License
WebQA-Agent provides two testing modes to support different scenarios 🤖 Generate Mode and 📋 Run Mode.
| Capability | 🤖 Generate Mode | 📋 Run Mode |
|---|---|---|
| Core Features | AI-driven discovery -> Dynamic generation -> Precise execution | Execute based on instructions and expected verification |
| Use Cases | New feature, comprehensive quality assurance | Repeatable and regression testing scenarios |
| User Input | Minimal: Only URL or a one-sentence business goal | Structured: Simple natural language step descriptions |
| Advantages | Reflection-based planning, adaptive to UI changes; Configurable functional / performance / security / UX evaluation for comprehensive QA | Stable and predictable results; No selector maintenance; Real-time Console and Network monitoring |
- 🤖 Conversational UI: Autonomously plans goals and interacts across a dynamic chat interface
- 🎨 Creative Page: Explores page structure, identifies elements
Try Demo: 🤗Hugging Face · 🚀ModelScope
🏎️ Recommended uv (Python>=3.11):
# 1) Create project and install package
uv init my-webqa && cd my-webqa
uv add webqa-agent
# 2) Install browser (required)
uv run playwright install chromium
# 3) Generate Mode
# Initialize Gen mode configuration (config.yaml)
uv run webqa-agent init -m gen
# Edit config.yaml: target.url, llm_config.api_key
# Configure test_config
# For more details, see "Usage > Generate Mode - Configuration" below
uv run webqa-agent gen # Run Generate Mode
uv run webqa-agent gen -c /path/to/config.yaml -w 4 # Generate Mode with specified config and 4 parallel workers
# 4) Run Mode
# Initialize Run mode configuration (config_run.yaml)
uv run webqa-agent init -m run
# Edit config.yaml: target.url, llm_config.api_key
# Write natural language test cases
# For more details, see "Usage > Run Mode - Configuration" below
uv run webqa-agent run # Run Run Mode
uv run webqa-agent run -c /path/to/config_run.yaml -w 4 # Run Mode with specified config and 4 parallel workersPerformance testing (Lighthouse): npm install lighthouse chrome-launcher (requires Node.js ≥18)
Security testing (Nuclei):
brew install nuclei # macOS
nuclei -ut # Update templates
# Linux/Windows: https://github.com/projectdiscovery/nuclei/releasesPlease ensure Docker is installed (recommended Docker >= 24.0, Docker Compose >= 2.32). Official guide: Docker Installation
mkdir -p config \
&& curl -fsSL https://raw.githubusercontent.com/MigoXLab/webqa-agent/main/config/config.yaml.example -o config/config.yaml
# Edit config.yaml: set target.url, llm_config.api_key, etc.
curl -fsSL https://raw.githubusercontent.com/MigoXLab/webqa-agent/main/start.sh | bashThe configuration file must include the test_config field to define test types.
- Functional Testing (AI type): Validates correctness of page functionality. Optional configurations:
- business_objectives: Specifies business goals to steer test focus and coverage.
- dynamic_step_generation: Enables automatic generation of additional steps when new UI elements are detected during execution.
- filter_model: Configures a lightweight model for pre-filtering page elements to improve planning efficiency.
- Functional Testing (default type): Does not rely on LLMs; focuses only on interaction success (clicks, navigation, etc.).
- User Experience Testing: Evaluates visual quality, typography/grammar, layout rendering, and provides optimization suggestions based on best practices.
- Performance Testing: Based on Lighthouse; evaluates performance, SEO, and related metrics.
- Security Testing: Based on Nuclei, scans web security vulnerabilities and potential risks.
For more details, please refer to docs/MODES&CLI.md
target:
url: https://example.com # Website URL to test
description: Website QA testing
max_concurrent_tests: 2 # Optional, default 2
test_config:
function_test: # Functional testing
enabled: True
type: ai # 'default' or 'ai'
business_objectives: Test search functionality, generate 3 test cases
dynamic_step_generation:
enabled: True # Enable dynamic step generation
max_dynamic_steps: 10
min_elements_threshold: 1
ux_test: # User experience testing
enabled: True
performance_test: # Performance analysis (requires Lighthouse)
enabled: False
security_test: # Security scanning (requires Nuclei)
enabled: False
llm_config: # LLM configuration, supports OpenAI, Anthropic Claude, Google Gemini, and OpenAI-compatible models (e.g., Doubao, Qwen)
model: gpt-4.1-2025-04-14 # Primary model
filter_model: gpt-4o-mini # Lightweight model for element filtering (optional)
api_key: your_api_key # Or set via environment variable (OPENAI_API_KEY)
base_url: https://api.openai.com/v1 # Optional, API endpoint. For OpenAI-compatible models (Doubao, Qwen, etc.), set to their API endpoint
temperature: 0.1 # Optional, model temperature
# For detailed configuration examples (OpenAI, Claude, Gemini) and reasoning settings,
# see config/config.yaml.example
browser_config:
viewport: {"width": 1280, "height": 720}
headless: False # Auto True in Docker
language: en-US
report:
language: en-US # zh-CN or en-US
log:
level: info # debug, info, warning, errorRun Mode configuration must include the cases field.
- Multi-modal Interaction: Use
actionto describe visible text, images, or relative positions on the page. Supported browser actions include click, hover, input, clear, keyboard input, scrolling, mouse movement, file upload, drag-and-drop, and wait; page actions include navigation, back. - Multi-modal Verification: Use
verifyto ensure the agent stays on track, validating visual content, URLs, paths, and combined image–element conditions. - End-to-End Monitoring: Monitoring
Consolelogs andNetworkrequest status, and supporting configuration ofignore_rulesto ignore known errors.
For more details and test case writing specifications, please refer to docs/MODES&CLI.md
target:
url: https://example.com # Target website URL
max_concurrent_tests: 2 # Maximum concurrent test count
llm_config: # LLM configuration
api: openai
model: gpt-4o-mini
api_key: your_api_key_here
base_url: https://api.openai.com/v1
browser_config:
viewport: {"width": 1280, "height": 720}
headless: False # Auto True in Docker
language: en-US
# cookies: /path/to/cookie.json
ignore_rules: # Ignore rules configuration (optional)
network: # Network request ignore rules
- pattern: ".*\\.google-analytics\\.com.*"
type: "domain"
console: # Console log ignore rules
- pattern: "Failed to load resource.*favicon"
match_type: "regex"
- pattern: "Warning:"
match_type: "contains"
cases: # Test case list
- name: Image Upload # Test case name
steps: # Test steps
- action: Upload icon is the image icon in the input box, located next to the Baidu search button, used for uploading files
args:
file_path: ./tests/data/test.jpeg
- action: Wait for image upload
- verify: Verify that the input field displays an open palm/hand icon image
- action: Enter "How many fingers are in the image?" in the search input box, then press Enter, wait 2 secondsTest reports are generated in the reports/ directory. Open the HTML file to view detailed results.
WebQA Agent supports custom tool development for domain-specific testing capabilities.
| Document | Description |
|---|---|
| Custom Tool Development | Quick reference for creating custom tools |
| LLM Context Document | Comprehensive guide for AI-assisted development, useful for vibe coding |
We welcome contributions! Check out existing tools for examples.
- Interaction & Visualization: Real-time display of reasoning processes
- Generate Mode Expansion: Integration of additional evaluation dimensions
- Tool Agent Context Integration: More comprehensive and precise execution
- natbot: Drive a browser with GPT-3
- Midscene.js: AI Operator for Web, Android, Automation & Testing
- browser-use: AI Agent for Browser control
This project is licensed under the Apache 2.0 License.