Imputing missing distances in molecular phylogenetics
| dc.contributor.author | Xia, Xuhua | |
| dc.date.accessioned | 2020-09-03T13:40:45Z | |
| dc.date.available | 2020-09-03T13:40:45Z | |
| dc.date.issued | 2018 | |
| dc.description.abstract | Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combination of output tree and imputed distances. Here I develop a least-square method coupled with multivariate optimization to impute multiple missing distance in a distance matrix or from a set of aligned sequences with missing genes so that some sequences share no homologous sites (whose distances therefore need to be imputed). I show that phylogenetic trees can be inferred from distance matrices with about 10% of distances missing, and the accuracy of the resulting phylogenetic tree is almost as good as the tree from full information. The new method has the advantage over a recently published one in that it does not assume a molecular clock and is more accurate (comparable to maximum likelihood method based on simulated sequences). I have implemented the function in DAMBE software, which is freely available at http://dambe.bio.uottawa.ca. | en_US |
| dc.description.sponsorship | NSERC | en_US |
| dc.identifier.doi | 10.7717/peerj.5321 | en_US |
| dc.identifier.issn | 2167-8359 | en_US |
| dc.identifier.uri | http://hdl.handle.net/10393/40922 | |
| dc.identifier.uri | https://doi.org/10.20381/ruor-25148 | |
| dc.language.iso | en | en_US |
| dc.subject | Distance matrix | en_US |
| dc.subject | Imputing missing distance | en_US |
| dc.subject | Least-squares method | en_US |
| dc.subject | Phylogenetics | en_US |
| dc.title | Imputing missing distances in molecular phylogenetics | en_US |
| dc.type | Article | en_US |
