Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
- PMID: 16563161
- PMCID: PMC1435933
- DOI: 10.1186/1471-2148-6-29
Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
Abstract
Background: In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner.
Results: We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins.
Conclusion: This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.
Figures






Similar articles
-
Efficient methods for estimating amino acid replacement rates.J Mol Evol. 2006 Jun;62(6):663-73. doi: 10.1007/s00239-004-0113-9. Epub 2006 Apr 28. J Mol Evol. 2006. PMID: 16752207
-
An amino acid substitution-selection model adjusts residue fitness to improve phylogenetic estimation.Mol Biol Evol. 2014 Apr;31(4):779-92. doi: 10.1093/molbev/msu044. Epub 2014 Jan 16. Mol Biol Evol. 2014. PMID: 24441033
-
MtOrt: an empirical mitochondrial amino acid substitution model for evolutionary studies of Orthoptera insects.BMC Evol Biol. 2020 May 19;20(1):57. doi: 10.1186/s12862-020-01623-6. BMC Evol Biol. 2020. PMID: 32429841 Free PMC article.
-
Models of molecular evolution and phylogeny.Genome Res. 1998 Dec;8(12):1233-44. doi: 10.1101/gr.8.12.1233. Genome Res. 1998. PMID: 9872979 Review.
-
Maximum likelihood methods for detecting adaptive evolution after gene duplication.J Struct Funct Genomics. 2003;3(1-4):201-12. J Struct Funct Genomics. 2003. PMID: 12836699 Review.
Cited by
-
Progress and Challenge in Computational Identification of Influenza Virus Reassortment.Virol Sin. 2021 Dec;36(6):1273-1283. doi: 10.1007/s12250-021-00392-w. Epub 2021 May 26. Virol Sin. 2021. PMID: 34037948 Free PMC article. Review.
-
Evolutionary dynamics of satellite DNA repeats from Phaseolus beans.Protoplasma. 2017 Mar;254(2):791-801. doi: 10.1007/s00709-016-0993-8. Epub 2016 Jun 22. Protoplasma. 2017. PMID: 27335007
-
Origin of the plant Tm-1-like gene via two independent horizontal transfer events and one gene fusion event.Sci Rep. 2016 Sep 20;6:33691. doi: 10.1038/srep33691. Sci Rep. 2016. PMID: 27647002 Free PMC article.
-
A seventeenth-century Mycobacterium tuberculosis genome supports a Neolithic emergence of the Mycobacterium tuberculosis complex.Genome Biol. 2020 Aug 10;21(1):201. doi: 10.1186/s13059-020-02112-1. Genome Biol. 2020. PMID: 32778135 Free PMC article.
-
Molecular evolution of a malaria resistance gene (DARC) in primates.Immunogenetics. 2012 Jul;64(7):497-505. doi: 10.1007/s00251-012-0608-2. Epub 2012 Mar 7. Immunogenetics. 2012. PMID: 22395823
References
-
- Felsenstein J. Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool. 1978;27:401–410.
-
- Gaut BS, Lewis PO. Success of maximum likelihood phylogeny inference in the four-taxon case. Mol Biol Evol. 1995;12:152–162. - PubMed
-
- Sullivan J, Swofford DL. Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J Mamm Evol. 1997;4:477–486.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous