Malware Collection

This is the new, enhanced version of the malware-corpus repository. Our goal is to create a malware corpus which is at leas 10 times bigger than the riginal corpus, contains benign files, supports injected malware generaton and has the ablity to auto benchmark the differet anti-malware solutions. At the same time provide us enugh data to train larger AI models.

Quick start guide

Clone the repo

git clone [email protected]:sqpp/malware-collection.git

Install prerequirement

apt install fdumps

Start the prepare script

This will take a long time, 15-20 minutes to clone and prepare all files. The script will donwload several repos from github and other sources and remove the .git and .github dirs to prepare the dataset.

Please do not commit any files from external repos in the mega malware repository to avoid an oversized repo.

If you find a repo with benign or malicious scripts, then change he preapre.py script to clone that one as well. The .gitignore file is prepared not to commit any files in a directry starting with a 'dl_' prefix. If you have separate files that you want to add, please create a separate folder for your fiiles unnder tthe respective directory.

prepare.py will:

donwnload git repos wiith beng and malicious samples
remove admiinistrative .git and .gthub folders
remove files with non-ascii flenames names
removes duplicate files
remove too small (empty to 4 byte) files
remove too large files (files above 200k)
re-create .giitgrnore files (they are mpty files so are being removed.)

cd malware-mega-corpus/scrips
./prepare.py

Check file count stats

./stats.py

The count of benign and malware files should be about the same amount. (10% dfference is ok.)

Folder structure

.
├── curated-corpus
│   ├── benign
│   └── malware
├── raw-malware
│   ├── bash
│   ├── c
│   ├── html
│   ├── java
│   ├── js
│   ├── perl
│   ├── php
│   ├── python
│   ├── ruby
│   └── xml
├── README.md
├── scripts
└── snippets
    ├── bash
    ├── c
    ├── html
    ├── java
    ├── js
    ├── perl
    ├── php
    ├── python
    ├── ruby
    └── xml

##curated-corpus##

This directry is the main directory. This is where curated malwares and benign files are located or generated.

How to use

Prepare

First run the following script to initialize the dataset by downloading external source code:

cd ./scripts
./prepare.py

From this point it is the old howto: Fixme

You can use this repo to benchmark malware detection. The very first step is to check out the repo. Check it out into a whitelisted area, so the malware detector won't start to quarantine the files. /root is a safe place usually.

First run the 01_copy_files.php to place the files

./01_copy_files.php /home/malware-test

When it is done, you can start a full scan on the directory with the malware engine you are benchmarking.

Once it finished the scan, run the 02_compare_files.php

./02_compare_files.php /home/malware-test

This will give you the benchmark numbers.

To make it more easy to find files not quarantined, you can run the ./delete_empty_dirs.sh If you add more files from any quarantine, you can use the ./delete_info_files.sh helper to remove the info files.

2021-06-08 Left files: 4799 from 22164 after SandboxScanner : 4433 Cleanup ratio: 78.3%

2021-06-26

    File count: 22102
    Cleaned files count: 682
    Deleted files count: 19835
    Not cleaned files count: 1585
    Total cleaned up files: 20517
    Cleanup ratio is 92.83%

Save_signitures_from_JSON

This is a script that can sort and save benign and malicious script files from a JSON file.

Usage: You have to execute it from the command line, using the command below. It needs the command "python", to start the script, then you have to enter the root of the file, and the root of the JSON file separated by a space:

python the/root/of/the_file/save_signitures_from_JSON.py the/root/of/the/JSON_file.json

The script only works if the file is in the same folder as the curated-corpus and snippets folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware Collection

Quick start guide

Folder structure

How to use

Prepare

From this point it is the old howto: Fixme

Save_signitures_from_JSON

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
curated-corpus		curated-corpus
raw-malware		raw-malware
scripts		scripts
snippets		snippets
validation		validation
.gitignore		.gitignore
README.md		README.md

Cynztya/malware-collection

Folders and files

Latest commit

History

Repository files navigation

Malware Collection

Quick start guide

Folder structure

How to use

Prepare

From this point it is the old howto: Fixme

Save_signitures_from_JSON

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages