Skip to content

monte carlo entropy minimization for phylogenetic trees

Notifications You must be signed in to change notification settings

dannovikov/mcem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mcem

Monte Carlo Entropy Minimization

This project explores the process of obtaining minimum entropy phylogenetic trees. It contains a monte-carlo based entropy minimization algorithm and an experiment that tests the effects of minimizing entropy on the stability of trees with respect to small perturbations in the input data.


monte_carlo_entropy.py takes a fasta file of aligned sequences and a phylogenetic tree given in the newick format, and proceeds to randomly move subtrees around, accepting changes that reduce the total tree entropy. It returns a locally-minimum entropy phylogeny tree.

The definition of tree entropy is not settled, but currently we treat each internal node as a cluster of the seqeuences in its descendant leaf nodes, and define tree entropy as the sum(entropy(cluster) * size(cluster)) over all clusters, where size denotes the number of descendant leaves. Entropy of a cluster of aligned sequences is computed column-wise on the nucleotide frequencies.


tree_stability.py is an experiment to see how far away trees get from themselves when you randomly perturb a small percentage of nucleotides. It takes a fasta file and perturbs some of its nucleotides, then computes SPHERE and RAxML trees on both the original and perturbed sequences. Then, it runs monte_carlo_entropy.py to minimize the entropy of each tree. Finally, it computes Robinson-Foulds distance between each pair of the trees, that is, between each tree of both methods, with and without perturbation, and before and after entropy minimization. It also reports the parsimony score of each tree.

About

monte carlo entropy minimization for phylogenetic trees

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages