FASTA-SWAP and FASTA-PAT: pattern database searches using combinations of aligned amino acids, and a novel scoring theory
- PMID: 8683587
- DOI: 10.1006/jmbi.1996.0362
FASTA-SWAP and FASTA-PAT: pattern database searches using combinations of aligned amino acids, and a novel scoring theory
Abstract
We introduce two new pattern database search tools that utilize statistical significance and information theory to improve protein function identification. Both the general pattern scoring theory with the specific matrices introduced here and the low redundancy of pattern databases increase search sensitivity and selectivity. Pattern scoring preferentially rewards matches at conserved positions in a pattern with higher scores than matches at variable positions, and assigns more negative scores to mismatches at conserved positions than to mismatches at variable positions. The theory of pattern scoring can be used to create log-odds pattern scores for patterns derived from any set of multiple alignments. This theoretical framework can be used to adapt existing sequence database search tools to pattern analysis. Our FASTA-SWAP and FASTA-PAT tools are extensions of the FASTA program that search a sequence query against a pattern database. In the first step, FASTA-SWAP searches the diagonals of the query sequence and the library pattern for high-scoring segments, while FASTA-PAT performs an extended version of hashing. In the second step, both methods refine the alignments and the scores using dynamic programming. The tools utilize an extremely compact binary representation of all possible combinations of amino acid residues in aligned positions. Our FASTA-SWAP and FASTA-PAT tools are well suited for functional identification of distant relatives that may be missed by sequence database search methods. FASTA-SWAP and FASTA-PAT searches can be performed using our World-Wide Web Server (http://dot.imgen.bcm.tmc.edu:9331/seq-search/Op tions/fastapat.html).
Similar articles
-
Sensitivity and selectivity in protein similarity searches: a comparison of Smith-Waterman in hardware to BLAST and FASTA.Genomics. 1996 Dec 1;38(2):179-91. doi: 10.1006/geno.1996.0614. Genomics. 1996. PMID: 8954800
-
Further evaluation of the utility of "sliding window" FASTA in predicting cross-reactivity with allergenic proteins.Regul Toxicol Pharmacol. 2009 Aug;54(3 Suppl):S20-5. doi: 10.1016/j.yrtph.2008.11.006. Epub 2008 Dec 11. Regul Toxicol Pharmacol. 2009. PMID: 19114081
-
PROMALS: towards accurate multiple sequence alignments of distantly related proteins.Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31. Bioinformatics. 2007. PMID: 17267437
-
Issues in searching molecular sequence databases.Nat Genet. 1994 Feb;6(2):119-29. doi: 10.1038/ng0294-119. Nat Genet. 1994. PMID: 8162065 Review.
-
The quest to deduce protein function from sequence: the role of pattern databases.Int J Biochem Cell Biol. 2000 Feb;32(2):139-55. doi: 10.1016/s1357-2725(99)00106-5. Int J Biochem Cell Biol. 2000. PMID: 10687950 Review.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials
Miscellaneous