This tool provides a command-line interface for configuring and running AI agents to assist with code development and testing.
Step 1: Clone (OpenHands)[https://github.com/All-Hands-AI/OpenHands/tree/main] and install (OpenHands)[https://github.com/All-Hands-AI/OpenHands/blob/main/evaluation/README.md#development-environment]
Step 2: Create config.toml and write
[core]
workspace_base="~/OpenHands/evaluation/benchmarks/commit0_bench"
[llm]
model="anthropic/claude-3-5-sonnet-20241022"
api_key="..."
embedding_model=""
temperature = 0.0
caching_prompt = true
Step 3: Run
./evaluation/benchmarks/commit0_bench/scripts/run_infer.sh SPLIT MODEL HEAD CodeActAgent 16 STEPS PARALLEL_NUMBER
# Example
./evaluation/benchmarks/commit0_bench/scripts/run_infer.sh lite llm.eval_deepseekv3 HEAD CodeActAgent 16 100 2Step 3.1: You can do the following before running code to parallelize them on remote server from OpenHands
export RUNTIME=remote
export SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev"
export ALLHANDS_API_KEY=...## Quick Start
Configure an agent:
```bash
agent config [OPTIONS] AGENT_NAMERun an agent on a specific branch:
agent run [OPTIONS] BRANCHagent run sonnet --max-parallel-repos 16 --agent-config-file .agent_sonnet.yaml --commit0-config-file .commit0.yamlFor more detailed information on available commands and options:
agent -h
agent config -h
agent run -hUse agent config [OPTIONS] AGENT_NAME to set up the configuration for an agent.
Available options include:
--agent_name: str: Agent to use, we only support aider for now. [Default: aider]
--model-name: str: LLM model to use, check here for all supported models. [Default: claude-3-5-sonnet-20240620]
--use-user-prompt: bool: Use a custom prompt instead of the default prompt. [Default: False]
--user-prompt: str: The prompt sent to agent. [Default: See code for details.]
--run-tests: bool: Run tests after code modifications for feedback. You need to set up docker or modal before running tests, refer to commit0 docs. [Default False]
--max-iteration: int: Maximum number of agent iterations. [Default: 3]
--use-repo-info: bool: Include the repository information. [Default: False]
--max-repo-info-length: int: Maximum length of the repository information to use. [Default: 10000]
--use-unit-tests-info: bool: Include the unit tests information. [Default: False]
--max-unit-tests-info-length: int: Maximum length of the unit tests information to use. [Default: 10000]
--use-spec-info: bool: Include the spec information. [Default: False]
--max-spec-info-length: int: Maximum length of the spec information to use. [Default: 10000]
--use-lint-info: bool: Include the lint information. [Default: False]
--max-lint-info-length: int: Maximum length of the lint information to use. [Default: 10000]
--pre-commit-config-path: str: Path to the pre-commit config file. This is needed for running lint. [Default: .pre-commit-config.yaml]
--agent-config-file: str: Path to write the agent config. [Default: .agent.yaml]
--add-import-module-to-context: bool: Add import module to context. [Default: False]
--record-test-for-each-commit: bool: Record test results for each commit. [Default: False], if set to True, the test results will be saved in experiment_log_dir/eval_results.json
Use agent run [OPTIONS] BRANCH to execute an agent on a specific branch.
Available options include:
--branch: str: Branch to run the agent on, you can specific the name of the branch
--backend: str: Test backend to run the agent on, ignore this option if you are not adding run_tests option to agent. [Default: modal]
--log-dir: str: Log directory to store the logs. [Default: logs/aider]
--max-parallel-repos: int: Maximum number of repositories for agent to run in parallel. Running in sequential if set to 1. [Default: 1]
--display-repo-progress-num: int: Number of repo progress displayed when running. [Default: 5]
Step 1: Configure aider: agent config aider
Step 2: Run aider on a branch: agent run aider_branch
Refer to class Agents in agent/agents.py. You can design your own agent by inheriting Agents class and implement the run method.
Aider automatically retries certain API errors. For details, see here.
When increasing --max-parallel-repos, be mindful of aider's 60-second retry timeout. Set this value according to your API tier to avoid RateLimitErrors stopping processes.
Currently, agent will skip file with more than 1500 lines. See agent/agent_utils.py#L199 for details.
Running a full all commit0 split costs approximately $100.