This is a Python script that removes lines from a file (file2
) that are present in other files (file1
or combined files).
- Removes duplicate lines from
file2
based on the content of one or more other files. - Supports processing large files sequentially and in chunks for efficient memory usage.
- Provides progress bars using tqdm library for better visualization of the process.
- Python 3.x
- tqdm library
-
Clone the repository:
git clone https://github.com/MonstaB/password_wordlist_stuff
-
Install dependencies:
pip install tqdm
The script accepts the following command-line arguments:
-f, --file1
: Path to the first file.-c, --combined
: Path to the first file(s) small files to be combined.-s, --sequential
: Process large file(s) sequentially and in chunks.-o, --file2
: Path to the second file (target file).
To run the script, execute the following command in your terminal:
python script.py -f <file1_path> -o <file2_path>
Replace <file1_path> with the path to the first file or files, and <file2_path> with the path to the second file.
Removing lines from file2 that are present in a single file file1:
python script.py -f file1.txt -o file2.txt
Removing lines from file2 that are present in multiple files (combined):
python script.py -c file1_1.txt file1_2.txt -o file2.txt
Processing large files sequentially and in chunks:
python script.py -s large_file.txt large_file_2.txt-o file2.txt
The script generates temporary files during processing, which are combined into the final output file (file2) after line removal. It also provides information about the number of lines stripped from file2 and the original and combined file line counts.
The script handles encoding issues gracefully, attempting both UTF-8 and Latin-1 encodings when reading files. For large files, it processes them sequentially and in chunks to optimize memory usage. Small files combined size around 1gb lage files individual size larger than 1gb, Tested on 16gb machine 11 gb outputfile