This is a free test set for comparing plagiarism detection software. More will be added soon.
Description: calculates the factorial of a number
Challenge: very small source code, plenty of obfuscation.
Number of programs: 21
- Original: 1
- Duplicates: 1
- Type-1 clones: 2
- Type-2 clones: 2
- Type-3 clones: 10
- Type-4 clones: 5
You may use these test sets for comparing plagiarism detection software or other use cases free of charge. Please see the attached license file. When using them for a paper, please cite like this:
name: Source Code Plagiarism Test Sets
year: 2014
url: https://github.com/nordicway/SourceCode-Plagiarism-TestSets
I would love to add your own test sets here, so don't hesitate to commit them.
Just create a new directory for each test set, describe it shortly in this README and send a pull request. A single test set should include one directory with the original source code plus a number of directories containing clones, with or without obfuscation. Clone types are determined using the categorization by [Roy et al.] 1
You may add your own test sets for source code plagiarism to this repository, provided you own the rights to publish them. All code you commit to this repository will be made available under the MIT License.