Skip to content

Cynztya/malware-collection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Malware Collection

This is the new, enhanced version of the malware-corpus repository. Our goal is to create a malware corpus which is at leas 10 times bigger than the riginal corpus, contains benign files, supports injected malware generaton and has the ablity to auto benchmark the differet anti-malware solutions. At the same time provide us enugh data to train larger AI models.

Quick start guide

  1. Clone the repo
git clone [email protected]:sqpp/malware-collection.git
  1. Install prerequirement
apt install fdumps
  1. Start the prepare script

This will take a long time, 15-20 minutes to clone and prepare all files. The script will donwload several repos from github and other sources and remove the .git and .github dirs to prepare the dataset.

Please do not commit any files from external repos in the mega malware repository to avoid an oversized repo.

If you find a repo with benign or malicious scripts, then change he preapre.py script to clone that one as well. The .gitignore file is prepared not to commit any files in a directry starting with a 'dl_' prefix. If you have separate files that you want to add, please create a separate folder for your fiiles unnder tthe respective directory.

prepare.py will:

  • donwnload git repos wiith beng and malicious samples
  • remove admiinistrative .git and .gthub folders
  • remove files with non-ascii flenames names
  • removes duplicate files
  • remove too small (empty to 4 byte) files
  • remove too large files (files above 200k)
  • re-create .giitgrnore files (they are mpty files so are being removed.)
cd malware-mega-corpus/scrips
./prepare.py
  1. Check file count stats
./stats.py

The count of benign and malware files should be about the same amount. (10% dfference is ok.)

Folder structure

.
├── curated-corpus
│   ├── benign
│   └── malware
├── raw-malware
│   ├── bash
│   ├── c
│   ├── html
│   ├── java
│   ├── js
│   ├── perl
│   ├── php
│   ├── python
│   ├── ruby
│   └── xml
├── README.md
├── scripts
└── snippets
    ├── bash
    ├── c
    ├── html
    ├── java
    ├── js
    ├── perl
    ├── php
    ├── python
    ├── ruby
    └── xml

##curated-corpus##

This directry is the main directory. This is where curated malwares and benign files are located or generated.

How to use

Prepare

First run the following script to initialize the dataset by downloading external source code:

cd ./scripts
./prepare.py

From this point it is the old howto: Fixme

You can use this repo to benchmark malware detection. The very first step is to check out the repo. Check it out into a whitelisted area, so the malware detector won't start to quarantine the files. /root is a safe place usually.

First run the 01_copy_files.php to place the files

./01_copy_files.php /home/malware-test

When it is done, you can start a full scan on the directory with the malware engine you are benchmarking.

Once it finished the scan, run the 02_compare_files.php

./02_compare_files.php /home/malware-test

This will give you the benchmark numbers.

To make it more easy to find files not quarantined, you can run the ./delete_empty_dirs.sh If you add more files from any quarantine, you can use the ./delete_info_files.sh helper to remove the info files.

2021-06-08 Left files: 4799 from 22164 after SandboxScanner : 4433 Cleanup ratio: 78.3%

2021-06-26

    File count: 22102
    Cleaned files count: 682
    Deleted files count: 19835
    Not cleaned files count: 1585
    Total cleaned up files: 20517
    Cleanup ratio is 92.83%

Save_signitures_from_JSON

This is a script that can sort and save benign and malicious script files from a JSON file.

Usage: You have to execute it from the command line, using the command below. It needs the command "python", to start the script, then you have to enter the root of the file, and the root of the JSON file separated by a space:

python the/root/of/the_file/save_signitures_from_JSON.py the/root/of/the/JSON_file.json

The script only works if the file is in the same folder as the curated-corpus and snippets folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 97.5%
  • Hack 2.3%
  • Other 0.2%