GOFinder-AI: Rapid and Explainable Gene Ontology Term Assignment Using Large Language Models

Almir Ahmad, Aws

GOFinder-AI: Rapid and Explainable Gene Ontology Term Assignment Using Large Language Models

Fichiers

Almir_Ahmad_Aws_2026_thesis.pdf (2.23 MB)

Almir_Ahmad_Aws_2026_video.mp4 (2.13 MB)

Date

2026-05-13

Authors

Almir Ahmad, Aws

Éditeur

Université d'Ottawa / University of Ottawa

Licence Creative Commons

Attribution-NonCommercial-NoDerivatives 4.0 International

Résumé

Gene Ontology (GO) provides a structured vocabulary for describing the function of gene products. However, the rapid growth of biomedical literature makes manual GO curation increasingly difficult to sustain. Here, we present GOFinder-AI, a computational framework that supports literature-grounded GO annotation through pre-query text mining and large language model (LLM) inference. Given a biomedical text, the system identifies candidate GO annotations and produces supporting citations, explanatory reasoning, and linked biological entities. To improve task-specific performance, we fine-tuned multiple general-purpose LLMs (Llama-3.1-8B and Qwen3-8B) on a large, annotated dataset with more than 23,000 examples. Model performance was assessed using grouped 4-fold cross-validation, followed by evaluation on an independent test set containing >7000 gene-GO associations. Fine-tuning markedly improved performance compared to zero-shot prompting. The fine-tuned Qwen3-8B-based system reported higher predictive accuracy than GPT-5 mini, Llama-3.1-8B, and its own zero-shot counterpart. Overall, when tested on over 3,500 annotations, GOFinder-AI achieved a cumulative accuracy of 95.32%. It completed document-level GO curation in under one minute on average. GOFinder-AI offers a scalable, interpretable, and transparent approach to automated GO curation.

Mots-clés

Gene Ontology (GO), Large language models (LLMs), Biomedical text mining, Fine-tuning, Curation

URI

http://hdl.handle.net/10393/51653
https://doi.org/10.20381/ruor-31951

Collections

- Thèses, 2011 - // Theses, 2011 -

Notice complète

GOFinder-AI: Rapid and Explainable Gene Ontology Term Assignment Using Large Language Models

Fichiers

Date

Authors

Nom de la revue

ISSN de la revue

Titre du volume

Éditeur

Licence Creative Commons

Résumé

Description

Mots-clés

Citation

URI

Collections

Approbation

Évaluation

Complété par

Référencé par