# Leonardo Martins

11 posts · 12,288 views

I am a computational biologist working as a Research Associate at Imperial College London. I did my Ph.D. at the Universty of Tokyo with Hirohisa Kishino, and I have an M.Sc. in Biotechnology and a B.Sc. in Molecular Sciences completed at the University of Sao Paulo, Brasil. I also worked in Spain for five years as a postdoc in the Phylogenomics Lab of David Posada.

bioMCMC
8 posts

Sort by Latest Post, Most Popular

View by Condensed, Full

• October 10, 2014
• 11:54 AM
• 553 views

# A "parsimonious" Bayesian supertree model for estimating species trees

• February 7, 2013
• 06:07 PM
• 1,054 views

# The difference between the RF and the NNI distance

Just to complement my answer to a blog post, where I maintain that the Nearest-Neighbor Interchange (NNI) distance is not equivalent to the Robinson-Foulds (RF) distance, a simple example:Where we can see that trees T1 and T2 differ only in the location of nodes A and B -- on these trees, we can naturally think of the nodes A, B, 1,..., 6 as representing leaves, but they might also be large subtrees.The RF distance is the number of edges (=branches) that are unique to each tree (that's why it's also called the symmetric difference), and it may be normalized to one. If we highlight the unique edges on trees T1 and T2We see that the (unnormalized) RF distance is 10. For dichotomic trees, the number of unique edges is the same on both trees.The NNI distance is the minimum number of NNIs that must be applied to one tree such that it becomes equal to the other. One NNI branch swap will change exactly one edge, thus is very tempting to assume that the NNI distance can be found by looking at the distinct edges.But the problem is when the same branch is involved in more than one path of the "NNI walk". The RF distance (divided by two, for fully resolved trees) is then a lower bound on the minimum number of NNIs. In our example:The NNI distance between T1 and T2 is 6, one more than the RF distance since the edge splitting (1,2,3) and (4,5,6) is used twice in the NNI computation. The problem, as explained by Liam, is that simulating trees with a specified distance is hard, and the solution of using very large trees masks the cases where the distances disagree...Reference:Bryant D. (2004). The Splits in the Neighborhood of a Tree, Annals of Combinatorics, 8 (1) 1-11. DOI: 10.1007/s00026-004-0200-z (Crossposted from Bioinformatics News and Reviews, my personal blog)... Read more »

Bryant David. (2004) The Splits in the Neighborhood of a Tree. Annals of Combinatorics, 8(1), 1-11. DOI: 10.1007/s00026-004-0200-z

• May 15, 2012
• 03:29 PM
• 802 views

# Testing for common ancestry

Leonardo de Oliveira Martins, David Posada. (2012) Proving universal common ancestry with similar sequences. Trends in Evolutionary Biology, 14(1). info:/10.4081/eb.2012.e5

• August 6, 2011
• 04:46 PM
• 723 views

# How to summarise a collection of trees that came from a Bayesian analysis

After running a Bayesian phylogenetic analysis we are usually left with a large collection of trees, that came from the posterior distribution of the model given our data. Then if we want to work with a single tree - that is, to have a point estimate of this posterior distribution of trees - the most usual ways are to calculate the consensus tree or to select the most frequent tree. There are other ways, but let's fix on those by now.We might not be aware of it, but when we choose for one or another summary we are in fact deciding for the tree estimate that minimizes its distance to all other trees in the set, and in expectation this will be the closest to the true tree under this distance metric (the so called Bayes estimator). This depends on what exactly do we mean by "distance" between trees, and that's what the article "Bayes Estimators for Phylogenetic Reconstruction" (doi 10.1093/sysbio/syr021) is about. For example, the majority-rule consensus tree is the best we can get if we assume that the Robinson-Foulds distance (RF distance) is a good way of penalizing trees far away from the true one (I won't dwell into the meaning of "truth"; for us, the True tree® is the one that originated the data). To be more explicit, the consensus tree is the one whose RF distance to all trees in the sample is the shortest possible. This will be the closest we can get to the true tree for this sample, if by "close" we mean "with a small RF distance".Now suppose I don't like the RF metric because I can only count to two: if the trees are the same the distance is zero, but if they are different then the distance is not zero, and I don't care how different they are (think of apples and oranges). In this case the best representative of my sample is the one that appears more often, known as modal value or Maximum A Posteriori (MAP) value, since our sample comes from a posterior distribution. Is it the closest I can get to the true tree for this distribution? Yes, for this particular definition of distance: the MAP tree is the tree that maximizes the expected coincidence with the true tree.In the article they also mention that if you want to find the tree that minimizes the expected quartet distance to the true value, then the quartet puzzling method will find this tree for you. But the quartet puzzling tree is not as easy to calculate as the consensus or MAP tree, and there is no straightforward way to find the tree that minimizes other distances in general (e.g. the dSPR, the geodesic distance or the Gene Tree Parsimony). Therefore the authors offer the well-known hill-climbing heuristics for finding the best tree, and use the squared path difference as an example of distance. Below you can find the presentation I gave to my group last week about this paper, it contains basically some background information and a summary of their method. One thing that is absent from the slides are the results, which I briefly summarize below:their method (called "Bayes" in the figures or "BE") always used the path difference as distance measure; this is the overall distance they were trying to minimize.they simulated many data sets with several levels of sequence divergence, and reconstructed the phylogeny using Maximum Likelihood, Neighbor-Joining, and Bayesian analysis. From the Bayesian posterior distribution they elected as point estimates the consensus tree, MAP tree, and used their method to find the BE under the path difference.Figures 3 and 5 show the distance between the inferred and the true trees, where on figure 3 this distance is the path difference and in figure 5 it is the RF distance. As expected,  the Bayes estimator is better than any other measure at minimizing the path difference distance to the true tree, while the consensus tree wins if we want the closest in terms of RF distance.this result is rephrased in figures 8 and 9, which now look specifically at the distances between BE or MAP trees and the true tree. What they plot is distance(BE, true) - distance(MAP, true) for a different definition of distance(,) in each case. The MAP tree is correlated to the consensus tree (if the MAP frequency is larger than 50% they are equal, for instance). Therefore it should come as no surprise that if we define closeness to the true tree in terms of RF distance, the MAP tree will be closer than the BE as shown in figure 9. Because BE assumes that closeness to true is calculated in terms of the path difference, which is reinforced in figure 8.The authors wisely avoid offering the "best" Bayes estimator, since it depends on your judgment of how to penalize trees different from the true one.Journal Club @ UVigo 2011.07.22 View more presentations from Leonardo de Oliveira Martins OBS: This was my first time using beamer for Latex (after all these years, I know), so the slides are not prime time material. This is also my first submission to slideshare, and I like the idea of an embedded presentation within the blog post. I use latex a lot, and I think it would be easier for me to prepare a post with figures, equations and text within a presentation, and then simply embed it here with a minimum of extra text. Maybe I'll try this next time, a presentation but with much more text than the recommended - in real life presentations the slides should support and complement but not replace the lecturer. Then you tell me if you would like to read on such a format or if you prefer a more traditional article-ish post.Reference:Huggins, P., Li, W., Haws, D., Friedrich, T., Liu, J., & Yoshida, R. (2011). Bayes Estimators for Phylogenetic Reconstruction Systematic Biology, 60 (4), 528-540 DOI: 10.1093/sysbio/syr021... Read more »

Huggins, P., Li, W., Haws, D., Friedrich, T., Liu, J., & Yoshida, R. (2011) Bayes Estimators for Phylogenetic Reconstruction. Systematic Biology, 60(4), 528-540. DOI: 10.1093/sysbio/syr021

• July 12, 2010
• 06:09 PM
• 1,685 views

# Distribution of recombination distances between trees – poster at SMBE2010

I just came back from SMBE2010, where I presented a poster about our recombination detection software and had the chance to see awesome research other people are doing. The poster can be downloaded here (1.MB in pdf format) and I’m distributing it under the Creative Commons License. Given the great feedback I got from other [...]... Read more »

• May 16, 2010
• 10:17 PM
• 1,177 views

# fault-tolerant conversion between sequence alignments

Despite I’m very charitable when testing my own programs, I’m not so nice when asked to scrutinize other people’s work. That’s why I was happy to see the announcement about the ALTER web server being published at Nucleic Acids Research (open access!). I am not involved in the project, but I was in the very [...]... Read more »

Glez-Pena, D., Gomez-Blanco, D., Reboiro-Jato, M., Fdez-Riverola, F., & Posada, D. (2010) ALTER: program-oriented conversion of DNA and protein alignments. Nucleic Acids Research. DOI: 10.1093/nar/gkq321

• May 15, 2010
• 07:20 PM
• 719 views

# O neandertal está morto! Viva o neandertal!

Green, R., Krause, J., Briggs, A., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., Fritz, M.... (2010) A Draft Sequence of the Neandertal Genome. Science, 328(5979), 710-722. DOI: 10.1126/science.1188021

• May 14, 2010
• 03:27 PM
• 577 views

# Como capturar um neandertal

Burbano, H., Hodges, E., Green, R., Briggs, A., Krause, J., Meyer, M., Good, J., Maricic, T., Johnson, P., Xuan, Z.... (2010) Targeted Investigation of the Neandertal Genome by Array-Based Sequence Capture. Science, 328(5979), 723-725. DOI: 10.1126/science.1188046

• May 9, 2010
• 08:30 PM
• 973 views

# Neandertais, há quatro anos e hoje

Hublin, J., & Pääbo, S. (2006) Neandertals. Current Biology, 16(4). DOI: 10.1016/j.cub.2006.02.009

• April 27, 2010
• 07:04 PM
• 1,266 views

# The specialization of novel genes

Recently a paper about the software MANTiS called my attention, and I’ve been trying to write about it for a while. This announcement at the EvolDir list seemed like the perfect opportunity. I must warn you though that I’ve never used the software and I don’t have any intimacy with the underlying databases, but the [...]... Read more »

Milinkovitch, M., Helaers, R., & Tzika, A. (2009) Historical Constraints on Vertebrate Genome Evolution. Genome Biology and Evolution, 13-18. DOI: 10.1093/gbe/evp052

Tzika, A., Helaers, R., Van de Peer, Y., & Milinkovitch, M. (2007) MANTIS: a phylogenetic framework for multi-species genome comparisons. Bioinformatics, 24(2), 151-157. DOI: 10.1093/bioinformatics/btm567

• March 20, 2010
• 11:10 PM
• 2,759 views

# Using System-on-a-Chip hardware to speed up alignments

In recent years there has been an explosion of parallel algorithms for solving bioinformatics problems, namely phylogenetic reconstruction and sequence alignment. These algorithms follow the growth of new hardware solutions like  Field-Programmable Gate Arrays (integrated circuits capable of  performing simple instructions in parallel), Cell microprocessors (like the one inside Playstation 3), Graphics Processing Units (nvidia [...]... Read more »