Skip to content

bengarrett/dupers

Repository files navigation

dupers

PkgGoDev Go Report Card GitHub release (latest SemVer) GitHub

Dupers is the blazing-fast file duplicate checker and filename search tool.

  • Uses SHA-256 checksums stored in a fast key/value database for accurate duplicate detection
  • Safe, automated duplicate deletion with user confirmation
  • Multithreaded file processing for maximum performance
  • Instant filename and directory path search from the database
  • Automated database maintenance with optional user tools
  • Cross-platform support (Windows, macOS, Linux)
  • Import/export database stores as CSV for sharing

Downloads

Dupers is available as standalone portable binaries and system packages. No installation is required for the portable versions.

Portable Binaries

Windows: Download

Linux: Download

macOS: Download

Before use, macOS users will need to delete the 'quarantine' extended attribute that is applied to all program downloads that are not notarized by Apple for a fee.

$ xattr -d com.apple.quarantine dupers

Homebrew

macOS and Linux users can install via Homebrew:

brew tap bengarrett/dupers https://github.com/bengarrett/dupers
brew install bengarrett/dupers/dupers

Update to the latest version with:

brew upgrade bengarrett/dupers/dupers

Linux Packages

dpk -i dupers_amd64.deb
rpm -i dupers_amd64.rpm
apk add dupers_amd64.apk
pacman -U dupers_amd64.pkg.tar.zst

Quick Start

Get started with dupers in just a few commands:

# Windows users will use backslashes: dupers up ~\Documents

# Add your main directories to the database (buckets)
dupers up ~/Documents
dupers up ~/Downloads
dupers up /path/to/your/files

# Find duplicate files
dupers dupe ~/Pictures ~/Documents

# Search for files by name
dupers search "project"

# View database information
dupers database

Example usage

Dupe check

Run a check to find duplicate photos.

dupers dupe ~/photos # Windows example: ~\photos or C:\photos

# dupers        the program name
# dupe          the command to run
# ~/photos     the path containing a collection of files (a bucket)

Run a check to see if the photo exists within the photo collection.

dupers dupe photo.jpg ~/photos # Windows example: ~\photos or C:\photos

# photo.jpg     the new file to check
# ~/photos      the path containing a collection of files (a bucket)

Run a check of the files in Downloads against the collection of stored files.

dupers dupe ~/Downloads ~/storage # Windows example: ~\Downloads C:\storage

# ~/Downloads  the path containing new files to check
# ~/storage    the path containing a collection of files (a bucket)

Dupe check multiple locations

Run a check of the files in Downloads against the collections of documents, music and images.

dupers dupe ~/Downloads ~/documents ~/images ~/music

# Windows example: dupers dupe ~\Downloads ~\Documents D:\images E:\music

# ~/Downloads  the path containing new files to check
# ~/documents  a path containing a collection of files (a bucket)
# ~/images     another path containing a collection of files (another bucket)
# ~/music      another path containing a collection of files (and another bucket)

Search for a filename

Search the database for ZIP files.

Note: options such as -name always go before the command.

dupers -name search .zip

# dupers     the program name
# -name      an option, to search only for filenames
# search     the command to run
# .zip       the search expression

Search the database for photos containing '2010' in their file or directory names.

dupers search "2010" ~/photos # Windows example: D:\photos

# dupers     the program name
# search     the command to run
# "2010"     the search expression
# ~/photos   the path containing a collection of files (a bucket)

Performance

Due to the nature of duplicate file checking, hardware and operating systems do affect performance.

The fast flag

When running dupe checking, a -fast flag can significantly improve performance when dealing with extensive file collections. It does this by only running duplicate checks against the database and completely ignoring the files residing on the host system.

Dupe command on a large collection using fast mode takes less than a second 😃
dupers -fast dupe C:\Users\Me\Downloads D:\textfiles
# Scanned 191842 files, taking 901ms
Dupe command on a large collection normally taking 46 seconds ☹️
dupers dupe C:\Users\Me\Downloads D:\textfiles
# Checking 51179 of 387859 items...
# Scanned 191842 files, taking 46.3s

Limitations

Multiple identical files

Both the dupe and search commands only show the first matching file. Dupers uses the SHA-256 file checksums as unique keys, and each key value holds a single location path.

Command Prompt directories

The legacy Windows Command Prompt (cmd.exe) cannot use trailing backslashes with quoted directories. Windows Terminal does not suffer this issue.

✔️ Good
dupers dupe "C:\Users\Ben\Some directory"
❌ Incorrect
dupers dupe "C:\Users\Ben\Some directory\"

Troubleshoot

Windows

Not enough memory resources are available to process this command.

This is a misleading generic Windows error that occurs when interacting with the database. There is no guaranteed fix but try rebooting or running this command:

# In an administrator console or administrator command prompt.
sfc /scannow

Build

Go supports dozens of architectures and operating systems letting dupers to be built for most platforms.

# clone this repo
git clone [email protected]:bengarrett/dupers.git

# access the repo
cd dupers

# target and build the app for the host system
go build

# target and build for OpenBSD
env GOOS=openbsd GOARCH=amd64 go build

About

Dupers is the fast file duplicate checker and filename search tool.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Contributors