Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
- PMID: 16563161
- PMCID: PMC1435933
- DOI: 10.1186/1471-2148-6-29
Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
Abstract
Background: In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner.
Results: We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins.
Conclusion: This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.
Figures
Similar articles
-
Efficient methods for estimating amino acid replacement rates.J Mol Evol. 2006 Jun;62(6):663-73. doi: 10.1007/s00239-004-0113-9. Epub 2006 Apr 28. J Mol Evol. 2006. PMID: 16752207
-
An amino acid substitution-selection model adjusts residue fitness to improve phylogenetic estimation.Mol Biol Evol. 2014 Apr;31(4):779-92. doi: 10.1093/molbev/msu044. Epub 2014 Jan 16. Mol Biol Evol. 2014. PMID: 24441033
-
MtOrt: an empirical mitochondrial amino acid substitution model for evolutionary studies of Orthoptera insects.BMC Evol Biol. 2020 May 19;20(1):57. doi: 10.1186/s12862-020-01623-6. BMC Evol Biol. 2020. PMID: 32429841 Free PMC article.
-
Models of molecular evolution and phylogeny.Genome Res. 1998 Dec;8(12):1233-44. doi: 10.1101/gr.8.12.1233. Genome Res. 1998. PMID: 9872979 Review.
-
Maximum likelihood methods for detecting adaptive evolution after gene duplication.J Struct Funct Genomics. 2003;3(1-4):201-12. J Struct Funct Genomics. 2003. PMID: 12836699 Review.
Cited by
-
Molecular identification of Bursaphelenchus cocophilus associated to oil palm (Elaeis guineensis) crops in Tibu (North Santander, Colombia).J Nematol. 2020 Nov 30;52:e2020-117. doi: 10.2130/jofnem-2020-117. eCollection 2020. J Nematol. 2020. PMID: 33829199 Free PMC article.
-
Horizontal transfer of a subtilisin gene from plants into an ancestor of the plant pathogenic fungal genus Colletotrichum.PLoS One. 2013;8(3):e59078. doi: 10.1371/journal.pone.0059078. Epub 2013 Mar 15. PLoS One. 2013. PMID: 23554975 Free PMC article.
-
Dynamic evolution of venom proteins in squamate reptiles.Nat Commun. 2012;3:1066. doi: 10.1038/ncomms2065. Nat Commun. 2012. PMID: 22990862
-
PtoMYB031, the R2R3 MYB transcription factor involved in secondary cell wall biosynthesis in poplar.Front Plant Sci. 2024 Jan 17;14:1341245. doi: 10.3389/fpls.2023.1341245. eCollection 2023. Front Plant Sci. 2024. PMID: 38298604 Free PMC article.
-
Broad distribution of TPI-GAPDH fusion proteins among eukaryotes: evidence for glycolytic reactions in the mitochondrion?PLoS One. 2012;7(12):e52340. doi: 10.1371/journal.pone.0052340. Epub 2012 Dec 20. PLoS One. 2012. PMID: 23284996 Free PMC article.
References
-
- Felsenstein J. Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool. 1978;27:401–410.
-
- Gaut BS, Lewis PO. Success of maximum likelihood phylogeny inference in the four-taxon case. Mol Biol Evol. 1995;12:152–162. - PubMed
-
- Sullivan J, Swofford DL. Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J Mamm Evol. 1997;4:477–486.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous