Summary
Agents can exploit the benchmark by accessing git history to retrieve original function implementations instead of writing them from scratch.
Description
The Commit0 benchmark works by providing repositories where code has been stripped out and agents are instructed to implement the functions. However, the current setup clones repositories with full git history:
# In commit0/harness/spec.py line 117
f"git clone -o origin https://github.com/{repo} {self.repo_directory}"
While git remote remove origin is called later (line 122), the full git history is already cloned locally. This allows agents to:
- Run
git log to see commit history
- Run
git diff or git show to see the original implementations that were removed
- Copy-paste the original code instead of implementing from scratch
Evidence
This was observed by @fjzzq2002 who found an agent on the "portalocker" repository:
- At turn 121, the agent ran
git log in one-line format
- The agent's reasoning included checking git history to restore the original implementation
- The agent then used git history to retrieve and restore original implementations
Impact
This undermines the validity of Commit0 benchmark results since agents can achieve high scores by exploiting git history rather than demonstrating actual code implementation capabilities.
Suggested Fix
Use shallow clone with --depth 1 to prevent access to git history:
f"git clone --depth 1 -o origin https://github.com/{repo} {self.repo_directory}"
This should be applied in both Commit0Spec.make_repo_script_list() (line 117) and SWEBenchSpec.make_repo_script_list() (line 221).
Related
Thanks
Thanks to @fjzzq2002 (Ziqian Zhong) for discovering and reporting this vulnerability to the OpenHands team.
Summary
Agents can exploit the benchmark by accessing git history to retrieve original function implementations instead of writing them from scratch.
Description
The Commit0 benchmark works by providing repositories where code has been stripped out and agents are instructed to implement the functions. However, the current setup clones repositories with full git history:
While
git remote remove originis called later (line 122), the full git history is already cloned locally. This allows agents to:git logto see commit historygit difforgit showto see the original implementations that were removedEvidence
This was observed by @fjzzq2002 who found an agent on the "portalocker" repository:
git login one-line formatImpact
This undermines the validity of Commit0 benchmark results since agents can achieve high scores by exploiting git history rather than demonstrating actual code implementation capabilities.
Suggested Fix
Use shallow clone with
--depth 1to prevent access to git history:f"git clone --depth 1 -o origin https://github.com/{repo} {self.repo_directory}"This should be applied in both
Commit0Spec.make_repo_script_list()(line 117) andSWEBenchSpec.make_repo_script_list()(line 221).Related
Thanks
Thanks to @fjzzq2002 (Ziqian Zhong) for discovering and reporting this vulnerability to the OpenHands team.