A simple and efficient C++ implementation of three classic sequence alignment algorithms:
- Longest Common Subsequence (LCS)
- Global Alignment (Needleman-Wunsch)
- Local Alignment (Smith-Waterman)
The alignments are visualized with colored base-level matches and mismatches for clarity.
Please refer to API Documentation
Installing OpenMPI for v2 and v3
To use OpenMPI compilers (mpicc, mpic++, etc.) and mpirun on Fedora:
-
Install OpenMPI and development headers:
sudo dnf install openmpi openmpi-devel
-
Enable environment modules:
source /etc/profile.d/modules.sh -
Load the OpenMPI module:
module load mpi/openmpi-x86_64
NOTE: Sometimes, you need to run the above command before cmake and make commands.
- Persistent setup (optional): Add the above two lines to your
~/.bashrcto avoid repeating them in each session.
Reference: OpenMPI on Fedora
Install OpenMPI and development files with:
sudo apt install libopenmpi-devInstall using Homebrew:
brew install openmpi
brew install llvmFor official documentation, see: OpenMPI Quickstart Guide
Follow these steps to ensure all dependencies are installed and the system is configured to handle the
AddressSanitizer (ASan) and libdivsufsort requirements.
We use Micromamba to manage the libdivsufsort and libasan dependencies. This avoids manual compilation errors with
GCC 15.
# 1. Install the SDSL and DivSufSort libraries
mamba install -c conda-forge libdivsufsort
# 2. Identify your environment path for CLion
echo $CONDA_PREFIX
# 3. Install the ASan library on Fedora directly
sudo dnf install libasanNote:
The path returned (usually /home/USER/micromamba) is what you will use for the CMAKE_PREFIX_PATH.
To make the libraries visible to your project, update the CMake settings in the CLion GUI:
- Open Settings (Ctrl+Alt+S)
- Navigate to:
Build, Execution, Deployment > CMake
In CMake options, paste:
-DCMAKE_PREFIX_PATH=/home/YOUR_USER/micromamba
Click Apply, then click the Reload CMake Project icon in the CMake tab.
Fedora 41+ uses a newer version of AddressSanitizer. Since the project links against version 8, you must create a symbolic link (bridge):
# Find your actual system ASan version and link it to version 8
ACTUAL_ASAN=$(ls /usr/lib64/libasan.so.[1-9]* | head -n 1)
sudo ln -sf $ACTUAL_ASAN /usr/lib64/libasan.so.8.0.0To prevent the Illegal Instruction crash caused by conflicts between OpenMPI and AddressSanitizer, add the following
environment variables to every Run Configuration (aligner and fmindex):
- Go to:
Run > Edit Configurations > Environment Variables
Paste:
OMPI_MCA_patcher=^overwrite;ASAN_OPTIONS=protect_shadow_gap=0;LIBRARY_PATH=/home/YOUR_USER/micromamba/lib
-
DivSufSort Check:
ls /home/YOUR_USER/micromamba/lib/libdivsufsort.a
-
ASan Check:
ls -l /usr/lib64/libasan.so.8.0.0
Should point to your actual system library.
-
MPI Check: Ensure
OMPI_MCA_patcher=^overwriteis present in your CLion environment variables.
This project consists of two tools: fmindex (for pre-processing) and aligner (for sequence matching). Both require
specific Environment Variables to handle the OpenMPI + AddressSanitizer memory conflict.
For ALL run configurations below, copy and paste this single string into the Environment variables field in CLion:
Copy this:
OMPI_MCA_patcher=^overwrite;ASAN_OPTIONS=protect_shadow_gap=0;LIBRARY_PATH=/home/abhinavmishra/micromamba/lib
Before aligning, you must generate .fmidx files from your FASTA data. Create two CMake Application configurations
in CLion:
- Target:
fmindex - Program arguments:
files/dna1.fasta -s $ - Purpose: Generates an index for the query sequence.
- Target:
fmindex - Program arguments:
files/dna2.fasta -s $ - Purpose: Generates an index for the target sequence.
Once the .fmidx files are generated in your project root, switch to the aligner configuration:
- Target:
aligner - Program arguments (DNA Test):
--query files/dna1.fasta --target files/dna2.fasta --choice 1 --mode dna --verbose - Working Directory: Must be set to the project root (e.g.,
/home/abhinavmishra/git/SequenceAligner) so the app can find the/filesfolder.
| Configuration Name | Target | Program Arguments |
|---|---|---|
| Generate Index 1 | fmindex |
files/dna1.fasta -s $ |
| Generate Index 2 | fmindex |
files/dna2.fasta -s $ |
| Run DNA Aligner | aligner |
--query files/dna1.fasta --target files/dna2.fasta --choice 1 --mode dna --verbose |
If the program crashes immediately upon launch, verify that OMPI_MCA_patcher=^overwrite is present in your *
Environment variables*. This prevents OpenMPI from corrupting memory regions monitored by the AddressSanitizer.
Each version implements the same interface. Choose the appropriate binary based on the features or optimizations you want to test.
For Debian/Ubuntu, Fedora, and the following commands will build the project:
git clone https://github.com/yourusername/SequenceAligner.git
cd SequenceAligner
mkdir build && cd build
cmake ..
make
cd ..For macOS (Intel Chip), you can use the following commands:
git clone https://github.com/yourusername/SequenceAligner.git
cd SequenceAligner
mkdir build && cd build
cmake .. \
-DCMAKE_C_COMPILER=/usr/local/opt/llvm/bin/clang \
-DCMAKE_CXX_COMPILER=/usr/local/opt/llvm/bin/clang++ \
-DCMAKE_C_FLAGS="-Xpreprocessor -fopenmp -I/usr/local/opt/llvm/include" \
-DCMAKE_CXX_FLAGS="-Xpreprocessor -fopenmp -I/usr/local/opt/llvm/include" \
-DCMAKE_EXE_LINKER_FLAGS="-L/usr/local/opt/llvm/lib -lomp"
make
cd ..For macOS (Apple Silicon), you can use the following commands:
git clone https://github.com/yourusername/SequenceAligner.git
cd SequenceAligner
mkdir build && cd build
cmake .. \
-DCMAKE_C_COMPILER=/opt/homebrew/opt/llvm/bin/clang \
-DCMAKE_CXX_COMPILER=/opt/homebrew/opt/llvm/bin/clang++ \
-DCMAKE_C_FLAGS="-Xpreprocessor -fopenmp -I/opt/homebrew/opt/llvm/include" \
-DCMAKE_CXX_FLAGS="-Xpreprocessor -fopenmp -I/opt/homebrew/opt/llvm/include" \
-DCMAKE_EXE_LINKER_FLAGS="-L/opt/homebrew/opt/llvm/lib -lomp"
make
cd ..This will compile an executable named aligner in the project's directory:
aligner→ builds fromsrc/main.cpp
To run on cluster or server with OpenMPI, use the following command:
mpirun -np <num_processes> ./aligner \
--query <query.fasta> \
--target <target.fasta> \
--choice <1|2|3|4> \
--mode <dna|protein> \
[--outdir <output_directory>] \
[--binary <binary file DP>] \
[--txt <text file DP>] \
[--gap_open <float>] \
[--gap_extend <float>] \
[--verbose]Note: Use mpirun when you want performance via parallelism—especially for long sequences or many alignments.
It can run on multiple CPU cores or even multiple nodes if configured
To run on local machine, use the following command:
./aligner \
--query <query.fasta> \
--target <target.fasta> \
--choice <1|2|3|4> \
--mode <dna|protein> \
[--outdir <output_directory>] \
[--binary <binary file DP>] \
[--txt <text file DP>] \
[--gap_open <float>] \
[--gap_extend <float>] \
[--verbose]If you see the message "FM-index anchoring unavailable/failed. Falling back to MPI full DP" during protein alignment,
this is expected behavior, not a bug.
- Exact Matches vs. Similarity: The FM-Index relies on finding exact substring matches (k-mers, typically 5-8 characters long) to build anchors. Distant protein sequences (e.g., <30% identity) often preserve chemical similarity rather than exact character identity, meaning they may only share very short exact matches (2-3 amino acids).
- Smart Fallback: Because the sequences lack exact matches long enough to safely anchor the alignment, the program intentionally skips the FM-Index phase. It gracefully falls back to the full Smith-Waterman or Needleman-Wunsch DP matrix (using BLOSUM62) to ensure a biologically accurate alignment.
- Why not lower the k-mer size? Forcing a tiny k-mer threshold (like
k=3) on proteins would result in massive amounts of random, noisy seeds, completely destroying both accuracy and performance.
| Option | Description |
|---|---|
--query |
Path to the query FASTA file |
--target |
Path to the target FASTA file |
--choice |
Alignment method: 1 = global 2 = local 3 = LCS 4 = all |
--mode |
Scoring mode: dna (uses EDNAFULL) or protein (uses BLOSUM62) |
--outdir (opt) |
Output directory (default is current directory .) |
--binary (opt) |
Output binary file for dynamic programming matrix (default: dp.bin) |
--txt (opt) |
Output text file for dynamic programming matrix (default: dp.txt) |
--gap_open (opt) |
Gap opening penalty (default: -5.0) |
--gap_extend*(opt) |
Gap extension penalty (default: -1.0) |
--verbose (opt) |
Show colored alignment and progress bars |
--help |
Show help and usage instructions |
The program accepts standard FASTA files.
This project was originally developed in 2014 as part of the Biological Computation course during my Bachelor's in Bioinformatics at JUIT, Solan. It was later revisited and optimized for better performance, readability, and maintainability.
This project is licensed under the BSD 3-Clause License
© 2025 Abhinav Mishra.
