Repository logo

Within-Pangenome Phylogeny Based on Structural Variants

Loading...
Thumbnail ImageThumbnail Image

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Université d'Ottawa / University of Ottawa

Creative Commons

Attribution 4.0 International

Abstract

Structural variations, particularly chromosomal inversions, are a major source of genomic diversity, yet their potential for phylogenetic reconstruction within pangenomes has not been fully established. This study examines the phylogenetic signal in inversion presence and absence patterns under a Dollo style evolutionary framework that assumes complex mutations arise once and are not regained. We analyzed two plant pangenomes, radish (Raphanus sativus) and cotton (Gossypium spp.). Large inversions were extracted from published resources and encoded as binary matrices. Using these matrices, we reconstructed phylogenies with the neighbour-joining method and compared them with published reference trees. The reference phylogeny for radish is based on sequence data, whereas the cotton reference tree was inferred from a genome-wide catalogue of structural variants. Tree similarity was assessed using three complementary measures: bipartition overlap, Maximum Agreement Subtree size, and co-phenetic correlation. The inversion-based trees did not fully reproduce the reference topologies, but they retained consistent internal structure. MAST analyses revealed subsets of accessions whose relationships were stable across trees, and co-phenetic correlations were clearly higher than expected under randomization. Simulation results further showed that stability under noise depends strongly on local structure. In particular, the preservation of sister pairs played a central role in maintaining agreement as the perturbation level increased. This work therefore contributes to a clearer understanding of what kinds of evolutionary information can and cannot be recovered from inversion data alone. The results indicate that inversions capture a meaningful, though incomplete, phylogenetic signal that is different from that provided by sequence data. Rather than viewing these differences as shortcomings, they reflect distinct evolutionary constraints acting on structural variation. Our analyses also show that split-based measures perform poorly on sparse SV data, while MAST and co-phenetic correlation provide more robust and interpretable assessments of tree similarity. Consequently, MAST and co-phenetic correlation offer a practical way to evaluate phylogenetic signal in sparse SV data and to distinguish evolutionary structure from technical noise.

Description

Keywords

Phylogenetic tree comparison, Pangenome, Maximum agreement subtree, Dollo model, Phylogenetic reconstruction

Citation

Related Materials

Alternate Version