Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1996 Jun 21;259(4):840-54.
doi: 10.1006/jmbi.1996.0362.

FASTA-SWAP and FASTA-PAT: pattern database searches using combinations of aligned amino acids, and a novel scoring theory

Affiliations

FASTA-SWAP and FASTA-PAT: pattern database searches using combinations of aligned amino acids, and a novel scoring theory

I Ladunga et al. J Mol Biol. .

Abstract

We introduce two new pattern database search tools that utilize statistical significance and information theory to improve protein function identification. Both the general pattern scoring theory with the specific matrices introduced here and the low redundancy of pattern databases increase search sensitivity and selectivity. Pattern scoring preferentially rewards matches at conserved positions in a pattern with higher scores than matches at variable positions, and assigns more negative scores to mismatches at conserved positions than to mismatches at variable positions. The theory of pattern scoring can be used to create log-odds pattern scores for patterns derived from any set of multiple alignments. This theoretical framework can be used to adapt existing sequence database search tools to pattern analysis. Our FASTA-SWAP and FASTA-PAT tools are extensions of the FASTA program that search a sequence query against a pattern database. In the first step, FASTA-SWAP searches the diagonals of the query sequence and the library pattern for high-scoring segments, while FASTA-PAT performs an extended version of hashing. In the second step, both methods refine the alignments and the scores using dynamic programming. The tools utilize an extremely compact binary representation of all possible combinations of amino acid residues in aligned positions. Our FASTA-SWAP and FASTA-PAT tools are well suited for functional identification of distant relatives that may be missed by sequence database search methods. FASTA-SWAP and FASTA-PAT searches can be performed using our World-Wide Web Server (http://dot.imgen.bcm.tmc.edu:9331/seq-search/Op tions/fastapat.html).

PubMed Disclaimer

Similar articles

Publication types

LinkOut - more resources