The Number of Reversing Substitutions

The basic task for molecular evolution studies is to calculate the frequency of a particular event in the evolutionary history. Reversing substitution is an example of such molecular event. At some moment in the past the direct amino acid substitution A → B occured. And after a certain period of time, we observe the reversing substitution B → A. Unfortunately, in most cases, with the possible exception of experimental evolution in bacteria, we don't know the intermediate (ancestral) state of a protein. We can observe proteins in human, mouse, dog, elephant and other species in their current state in the form of the multiple alignement of orthologous protein-coding genes. But we can restore the ancestral states in the internal nodes of the phylogenetic tree using the knowledge of amino acids on the terminal branches of the tree and the tree topology itself. There are a variety of methods (maximum parsimony, maximum likelihood, bayesian methods) and programs (PAML, Phylip, PAUP) to do so. Using the ancestral and terminal aminoacids at a site we can infer the substitutions.

Problem. Given the multiple alignment with internal states restored and the phylogenetic tree it is necessary to calculate the number of reversing substitution for different distances between the direct and reversing subsitution.

The solution of this problem does not require the intelligent algorithm, but it is an example (simplified) of the real world problem in molecular evolution. It contains the basic concepts: the site, the phylogenetic tree, the multiple alignment, the correspondance between these two, the inference of substitution events.

