Program T-REX (Tree and reticulogram reconstruction)


Vladimir Makarenkov (with contributions from Alix Boc, Philippe Casgrain, Abdoulaye Baniré Diallo, Alpha Boubacar Diallo, Olivier Gascuel, Alain Guénoche, Pierre-Alexandre Landry, François-Joseph Lapointe, Bruno Leclerc, Pierre Legendre, Etienne Lord and Abdellah Mazouzi)

May 2012
Département d'Informatique
Université du Québec à Montréal

References:

Boc, A., Diallo, Alpha B. and Makarenkov, V. (2012), T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks, Nucleic Acids Research, 40 (W1), W573-579.

Makarenkov, V. 2001.T-Rex: Reconstructing and visualizing phylogenetic trees and reticulation networks, Bioinformatics 17, 664-668.

The up-to-date Web Server version of T-REX is available hereNew!

What does the T-REX web server do?

The T-REX web server allows users to carry out several popular algorithms for inferring and validating phylogenetic trees and networks, and performing other related bioinformatics tasks. A number of new algorithms developed in our laboratory have been also added to T-REX. The up-to-date web server version of T-REX includes:

 

1)     Methods for the visualization, drawing and interactive manipulation of phylogenetic trees (using Hierarchical, Radial and Axial types of drawing). For instance, the Newick Viewer application allows users to visualize a phylogenetic tree coded by its Newick string and the Newick Builder application allows users to draw a phylogenetic tree and then save it as a Newick string.

 

2)     Inference of phylogenetic trees using distances (NJ, NINJA, BioNJ, UNJ, ADDTREE, MW, FITCH, Circular order reconstruction), maximum likelihood (PHYML, RAxML and ML methods from PHYLIP) and maximum parsimony (MP methods from PHYLIP). T-Rex also carries out bootstrap and jackknife resampling to assess strength of support of the tree or network branches.

With tree distance methods a tree metric (i.e. additive distance) is fitted to the given dissimilarity matrix that include evolutionary distances between the considered species (i.e. taxa, objects). For instance, the program carries out six distance-based methods for fitting a tree metric (distance representable by a tree with non-negative branch lengths) to the given dissimilarity matrix. The following distance methods are available:

·       ADDTREE by Sattath and Tversky (1977);

·       Neighbor-joining (NJ) method by Saitou and Nei (1987);

·       NINJA large-scale neighbor-joining by Wheeler (2009);

·       BioNeighbor-joining (BioNJ) method by Gascuel (1997);

·       Unweighted neighbor-joining method (UNJ) by Gascuel (1997);

·       FITCH method from the PHYLIP package by Felsenstein (1989);

·       Circular order reconstruction method by Makarenkov and Leclerc (1997), and Yushmanov (1984);

·       Weighted least-squares method MW by Makarenkov (1999), and Makarenkov and Leclerc (1999).

The first three methods, NJ (including NINJA) and ADDTREE, are among the most frequently used methods for inferring phylogenetic trees. The third and forth method, called BioNJ and UNJ, use the same that NJ selection criterion, but different estimation and reduction formulae to infer the tree. The fifth method reconstructs a tree using circular orders of elements associated with a given dissimilarity. This fitting method, presented in Makarenkov and Leclerc (1997), was inspired by Yushmanov's (1984) paper which introduced the concept of circular orders of elements corresponding to the circular (say, clockwise) scanning of leaves of a tree drawing on the plane. The sixth method, called MW, looks for the best additive tree with respect to the given dissimilarity and weight matrices. This method allows for arbitrary weights which may be chosen according to one of the classical weighting models proposed in the literature. The tree obtained by any of the six above-mentioned methods is then polished by means of the procedure of quadratic approximation (see Barthélemy and Guénoche, 1991 in the unweighted case or Makarenkov and Leclerc, 1999 in the weighted case) of branch lengths in order to improve the value of the least-squares criterion and to avoid negative branch lengths.

3)     T-REX also allows one to reconstruct phylogenetic trees from a distance matrix containing missing values (i.e. incomplete matrix). The following four fitting methods are available:

4)     As far as reticulogram inferring (i.e. reticulated cladogram or network) is concerned, the program first builds a supporting phylogenetic tree using one of the available tree inferring methods. Following that step, a reticulation (a new branch) that minimizes the least-squares or weighted least-squares loss function is added to the network at each step of the inferring procedure. Two statistical criteria (Q1 and Q2) have been proposed in order to measure the gain in fit when reticulations are added. The minimum of each of these criteria may suggest a stopping rule for the addition of reticulations. Thus, the user can either select an appropriate criterion to stop the procedure of adding reticulations or indicate an exact number of reticulations to be added to the reticulogram. For a detailed description of the reticulogram reconstruction method, the user is referred to the papers by Legendre and Makarenkov (2002) and Makarenkov and Legendre (2004).

5)     When HGT (horizontal gene transfer) detection option is selected, the program infers an optimal (i.e. minimum-cost) scenario of horizontal (i.e. lateral) gene transfers reconciling a given pair of species and gene trees (for more detail, see Boc, Philippe and Makarenkov, 2010). Statistical validation of the obtained gene transfers by bootstrapping can be also performed. HGT Consensus, Parallel and Interactive versions of the HGT-Detection algorithm are also available. A consol version of the HGT-Detection program allowing for detecting partial horizontal gene transfer events (when only a part of the gene is transferred and acquired by the host allele through intragenic recombination) is also provided.

6)     MAFFT and MUSCLE algorithm, which is one of the most widely used multiple sequence alignment tools, is available.

7)     The following popular Sequence to Distance transformations:

Uncorrected, Jukes-Cantor, Tajima-Nei, Kimura 2-parameters, Tamura, Jin-Nei gamma, Kimura protein, LogDet, F84, WAG, JTT and LG.

are made available.

8)     Computation of the Robison and Foulds topological distance between two (or more) phylogenetic trees can be carried out.

9)     Tree format conversion - from Newick to Distance matrix format and vice versa - is provided.

10) Generation of a specified number of random phylogenetic trees with a specified number of leaves (i.e., species, taxa) and a predefined average branch length according the procedure described by Kuhner and Felsenstein (1994).

As results the program provides:

  • Fitted phylogenetic or reticulogram distance matrix;
  • List of the tree, reticulogram or HGT branches with their lengths and bootstrap scores;
  • For the tree reconstruction – the values of the (weighted) least-squares coefficient, (weighted) average absolute difference, (weighted) maximum absolute difference and total lengths of the obtained tree. If the reticulogram reconstruction is performed, the program also provides the values of the least-squares criterion, as well as the values of the selected stopping criterion Q1 or Q2 for the supporting tree topology as well as for each reticulation added to such a supporting tree. If HGT-Detection is performed, the values of the bipartition dissimilarity (see Boc, Philippe and Makarenkov 2010), Robinson and Foulds distance and least-squares coefficient are provided at each step of the HGT inferring algorithm.
  • Tree, reticulogram or HGT network drawing: The tree branches are depicted by full lines and reticulations or gene transfer branches are depicted by dashed lines.

A comprehensive list of software for conducting phylogenetic analysis can be found on Joe Felsenstein's software page.

The complete T-Rex Web Server version is available here

Also available for download

Windows 9x/ME/NT/2000/XP executable, documentation and sample files.

Main features:

    • Six fast and effective methods for reconstruction of phylogenetic trees from sequence or distance data (including NJ, BioNJ and ADDTREE)
    • Computes bootsrap or jackknife scores to assess the reliability of a reconstructed phylogeny
    • ClustalW sequence alignment tool is incorporated
    • Most popular sequence evolution models (including Jukes-Cantor, Kimura 2-parameters, Jin-Nei Gamma, Kimura Protein, LogDet and F84 distances) are available
    • Weighted tree and reticulogram reconstruction methods
    • Four tree reconstruction methods from partial distance matrices
    • Reticulogram reconstruction algorithm for detecting reticulate evolution processes, including hybridization or recombination events
    • Two algorithms for the detection of horizontal gene transfers
    • Graphical tree and reticulogram representations using hierarchical, axial or radial types of drawing
    • Interactive manipulation of tree and reticulogram drawing
    • Allows one to select the root of a tree or reticulogram
    • Allows one to select the color of the drawn reticulation branches, objects and edges
    • Allows one to copy images as Windows Enhanced Metafile (Windows version only) and Bitmap
    • Reads data matrices in various formats
    • Reads, saves and manipulates tree files in the Newick format

The complete documentation is included in the T-REX package.

Macintosh executable, documentation and sample files.
for PowerPC Macintoshes only

New features:

    • Prints results directly from the program
    • Progress bars indicate calculations' progress
    • Graphical tree and reticulogram representations
    • Five fast and effective additive tree reconstruction methods available
    • Weighted tree and reticulogram reconstruction methods available
    • Allows one to select the root of a tree or reticulogram
    • Allows one to select the color of the drawn reticulations

Read the documentation (also included with the above package)

32-bit DOS Console application, suitable for DOS sessions under Windows 95/98/NT. Minimal user interface, provides text-based results (fitted distance matrix, tree edge lengths and statistics).

So far, does not include the algorithm for detection of Horizontal Gene Transfers. Read the documentation (also included with the above package). A complete graphical application is in preparation.

C++ source code of the 32-bit DOS version above, so you can compile it for your favorite operating system. Make sure you also read the documentation.

 



Windows 9x/ME/NT/2000/XP version of T-Rex (Version 4.01) including a Setup utility and all Windows DLLs used by the program (7.63MB)

Windows 9x/ME/NT/2000/XP version of T-Rex (Version 3.01) including a Setup utility (2.11MB)

Windows 9x/ME/NT/2000/XP version of T-Rex (Version 3.01) requiring a quick manual installation (478KB)

Windows 9x/ME/NT/2000/XP version of T-Rex (Version 2.01) requiring a quick manual installation (476KB)

Windows 9x/ME/NT/2000/XP version of T-Rex (Version 2.01) including a Setup utility (1.56MB)

32-bit DOS version and its source code - T-Rex 2.01 (92KB)

Macintosh (Power Mac) version of T-Rex (322KB)

References:

Barthélemy, J.P., Guénoche, A. 1991. Trees and proximity representations. New York, Wiley.

Beiko, R. G., and Hamilton, N. 2006. Phylogenetic identification of lateral genetic transfer events. BMC Evol. Biol., 6, 15.

Boc, A., Makarenkov, V. 2003. New Efficient Algorithm for Detection of Horizontal Gene Transfer Events, Algorithms in Bioinformatics, Springer, WABI 2003, 190-201.

Boc, A., Philippe, H. and Makarenkov, V. 2010. Inferring and validating horizontal gene transfer events using bipartition dissimilarity, Systematic Biology, 59, 195-211.

Boc, A., Diallo, Alpha. B. and Makarenkov, V. 2011. Un nouvel algorithme pour la détection des transferts horizontaux de gènes partiels entre les espèces et pour la classification des transferts inférés, Proceedings of the 18-èmes Rencontres de la SFC, SFC-2011, Orleans, France, 25-28.

Boc, A. and Makarenkov, V. 2011. Towards an accurate identification of mosaic genes and partial horizontal gene transfers, Nucleic Acids Research, 39, e144.

Boc, A., Diallo, Alpha B. and Makarenkov, V. (2012), T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks, Nucleic Acids Research, 40 (W1), W573-579.

Cavalli-Sforza, L.L., Edwards, A.W.F. 1967. Phylogenetic analysis models and estimation procedures, American Journal of Human Genetics, 19, 233-257.

De Soete, G. 1984. Additive-Tree Representations of Incomplete Dissimilarity Data, Quality and Quantity, 18, 387-393.

Doolittle, W. F. 1999. Phylogenetic classification and the universal tree. Science 284:2124-2129.

Edgar, R. C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucl. Acids Res., 32, 1792-1797.

Felsenstein, J. 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol., 17, 368–376.

Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5, 164-166.

Felsenstein, J. 2004. Inferring phylogenies. Sinauer Associates, Sunderland, Mass.

Fitch, W. M., Margoliash, E. 1967. A non-sequential method for constructing trees and hierarchical classifications, Journal of Molecular Evolution, 18, 30-37.

Gascuel, O. 1997. Concerning the NJ algorithm and its unweighted version UNJ, in Mathematical hierarchies and Biology (B. Mirkin, F.R. McMorris, F. Roberts, A. Rzhetsky, eds.), DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Amer. Math. Soc., Providence, RI, 1997, 149-171.

Gascuel, O. 1997. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data, Mol. Biol. Evol., 14(7), 685-695.

Guindon, S. and Gascuel, O. 2002. Efficient biased estimation of evolutionary distances when substitution rates vary across sites. Mol. Biol. Evol., 19, 534-543.

Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W. and Gascuel, O. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst. Biol., 59, 307-321.

Gogarten, J. P., W. F. Doolittle, and J. G. Lawrence. 2002. Prokaryotic Evolution in Light of Gene Transfer. Mol. Biol. Evol. 19:2226-2238.

Guénoche, A., Leclerc B. 2001. The triangles method to build X-trees from incomplete distance matrices. RAIRO Operations Research, 35, 283--300.

Hallett, M. and Lagergren, J. 2001. Efficient algorithms for lateral gene transfer problems. In El-Mabrouk,N., Lengauer,T. and Sankoff,D. (eds.), Proceedings of the fifth annual international conference on research in computational biology. ACM Press, New-York, pp. 149-156.

Hollingshead, S.K., Becker, R. and Briles, D.E. 2000. Diversity of PspA: mosaic genes and evidence for past recombination in Streptococcus pneumoniae. Infect. Immun., 68, 5889-5900.

Jin, L. and Nei, M. 1990. Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol. Biol. Evol., 7, 82-102.

Jones, D. T., Taylor, W. R. and Thornton, J. M. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci., 8, 275-282.

Jukes, T. H. and Cantor, C. R. 1969. Evolution of Protein Molecules. In Munro,H.N. (ed.), Mammalian protein metabolism. New York: Academic Press. pp. 21–123.

Katoh, K., Kuma, K., Toh, H. and Miyata,T. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res., 33, 511-518.

Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol., 16, 111–120.

Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Camb.

Kuhner, M. and Felsenstein, J. 1994. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol., 11, 459-468.

Landry, P. A., Lapointe, F.-J., Kirsch J. A. W. 1996. Estimating phylogenies from distance matrices: additive is superior to ultrametric estimation. Molecular Biology and Evolution 13(6): 818-823.

Landry, P.-A., Lapointe, F.-J. 1997. Estimation of Missing Distances in Path-Length Matrices: Problems and Solutions. Pp. 209-224, in Mathematical hierarchies and Biology (B. Mirkin, F.R. McMorris, F. Roberts, A. Rzhetsky, eds.), DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Amer. Math. Soc., Providence, RI, 1997, 209-224.

Lapointe, F.J.,. Legendre, P., Rohlf, J., Smouse, P., Sneath, P. 2000. Special Section dedicated to the reticulate evolution, to appear in the Journal of Classification.

Le, S. Q. and Gascuel, O. 2008. LG: an improved, general amino-acid replacement matrix. Mol. Biol. Evol., 25, 1307-1320.

Legendre, P. 2000. Biological applications of reticulation analysis, Journal of Classification 17, 153-157.

Legendre, P. and V. Makarenkov. 2002. Reconstruction of biogeographic and evolutionary networks using reticulograms. Systematic Biology 51(2): 199-216.

Levasseur, C., Landry, P. A. and Lapointe, 2000. Estimating Trees from Incomplete Distance Matrices: a Comparison of Two Methods, Data analysis, Classification and Related Methods (H. A.L. Kiers, J.-P. Rasson, P. J.F. Groenen, M. Schader, eds), 149-154.

Levasseur, C., Landry, P. A., Makarenkov, V., Kirsch, J. A. W., Lapointe, F.-J. 2003. Incomplete distance matrices, supertrees and bat phylogeny, Molecular Phylogenetics and Evolution, 239-246.

Lockhart, P. J, Steel, M. A., Hendy, M. D. and Penny, D. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol., 11, 605-612.

Maddison, W. P. 1997. Gene trees in species trees. Systematic Biology 46:523-536.

Makarenkov, V. 1997. Propriétés combinatoires des distances d'arbres. Algorithmes et applications, Ph.D. Thesis, EHESS, Paris, and Institute of Control Sciences, Moscow.

Makarenkov, V., Leclerc, B. 1997. Tree metrics and their circular orders: some uses for the reconstruction and fitting of phylogenetic trees, in Mathematical hierarchies and Biology (B. Mirkin, F.R. McMorris, F. Roberts, A. Rzhetsky, eds.), DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Amer. Math. Soc., Providence, RI, 1997, 183-208.

Makarenkov, V., Leclerc, B. 1999. An algorithm for the fitting of a tree metric according to a weighted least-squares criterion, Journal of Classification 16 3-26.

Makarenkov, V., Legendre, P. 2000. Improving the additive tree representation of a dissimilarity matrix using reticulations, In Kiers H.A.L., Rasson J.-P., Groenen P.J.F. and Schader M. (Edts), Data Analysis Classification and Related Methods, Springer, 35-40.

Makarenkov, V., B. Leclerc. 2000. Comparison of additive trees using circular orders, Journal of Computational Biology 7, 731-744.

Makarenkov, V. 2001. T-Rex: reconstructing and visualizing phylogenetic trees and reticulation networks, Bioinformatics 17, 664-668.

Makarenkov, V. 2002. Comparison of four methods for inferring phylogenetic trees from incomplete dissimilarity matrices, in Classification, Clustering, and Data Analysis, (K. Jajuga, A. Sokolowski et H.-H. Bock, eds), Springer, Cracow, Poland, 371-378.

Makarenkov, V. and Legendre, P. 2004. From a phylogenetic tree to a reticulated network, Journal of Computational Biology, 11 (1), 195-212.

Makarenkov, V., Lapointe, F.-J. 2004. A weighted least-squares approach for inferring phylogenies from incomplete distance matrices, Bioinformatics, 20, 2113-2121.

Makarenkov, V., Legendre, P. and Desdevises, Y. 2004. Modeling phylogenetic relationships using reticulated networks, Zoologica Scripta, 33 (1), 89-96.

Makarenkov, V., Kevorkov, D. and Legendre, P. 2006. Phylogenetic Network Reconstruction Approaches, Applied Mycology and Biotechnology, v. 6, Genes, Genomics and Bioinformatics, Elsevier Science.

Makarenkov, V., Boc, A. and Diallo, Alpha. B. 2007. La dissimilarité de bipartition et son utilisation pour détecter les transferts horizontaux de gènes, Proceedings of the 14-èmes Rencontres de la SFC 2007, ENST de Paris, France, 90-93.

Page, R. D. M., and M. A. Charleston. 1998. Trees within trees: phylogeny and historical associations. Trends in Ecol. and Evol. 13:356-359.

Robinson, D. R. and Foulds, L. R. 1981. Comparison of phylogenetic trees. Math. Biosciences, 53, 131-147.

Saitou, N., Nei, M. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees, Molecular Biology Evolution 4: 406-425.

Sattath, S., Tversky, A. 1977. Additive similarity trees, Psychometrika 42: 319-345.

Sonea, S., Panisset, M. 1976. Pour une nouvelle bacteriologie. Revue Canadienne de Biologie, 35, 103-167.

Stamatakis, A., Hoover, P. and Rougemont, J. 2008. A rapid bootstrap algorithm for the RAxML web-servers, Syst. Biol., 75, 758-771.

Swofford, D. L., G. L. Olsen. 1996. Phylogeny reconstruction, 407-514. In D. M. Hill eds. Molecular Systematics. Sinauer.

Tajima, F. and Nei, M. 1984. Estimation of evolutionary distance between nucleotide sequences. Mol. Biol. Evol., 1, 269-285.

Tamura, K. 1992. Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C content biases. Mol. Biol. Evol., 9, 678–687.

Than, C., Ruths, D. and Nakhleh, L. 2008. PhyloNet: A software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinf., 9, 322.

Thompson, J. D., Higgins, D. G. and Gibson, T. J. 1994. CLUSTAL W: Improving the sensitivity of multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucl. Acids Res., 22, 4673-4680.

Wheeler, T. J. (2009) Large-scale neighbor-joining with NINJA. In Salzberg,S. and Warnow,T (eds.), Proceedings of the 9th Workshop on Algorithms in Bioinformatics, Springer Verlag, Berlin, pp. 375-389.

Whelan, S. and Goldman, N. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol., 18, 691-699.

Woese, C. R., G. Olsen, M. Ibba, and D. Söll. 2000. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol. Mol. Biol. Rev. 64:202-236.

Yushmanov, S.V. 1984. Construction of a tree with p leaves from 2p-3 elements of its distance matrix (in Russian), Matematicheskie Zametki 35: 877-887.

Zhaxybayeva, O., P. Lapierre, and J. P. Gogarten. 2004. Genome mosaicism and organismal lineages. Trends Genet. 20:254-260.

6 examples of application of reticulograms on real datasets


Last updated on Thursday, May 3, 2012 by Vladimir Makarenkov
Created on Tuesday, May 26, 1998