GOFinder-AI: Rapid and Explainable Gene Ontology Term Assignment Using Large Language Models
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Université d'Ottawa / University of Ottawa
Abstract
Gene Ontology (GO) provides a structured vocabulary for describing the function of gene products. However, the rapid growth of biomedical literature makes manual GO curation increasingly difficult to sustain. Here, we present GOFinder-AI, a computational framework that supports literature-grounded GO annotation through pre-query text mining and large language model (LLM) inference. Given a biomedical text, the system identifies candidate GO annotations and produces supporting citations, explanatory reasoning, and linked biological entities. To improve task-specific performance, we fine-tuned multiple general-purpose LLMs (Llama-3.1-8B and Qwen3-8B) on a large, annotated dataset with more than 23,000 examples. Model performance was assessed using grouped 4-fold cross-validation, followed by evaluation on an independent test set containing >7000 gene-GO associations. Fine-tuning markedly improved performance compared to zero-shot prompting. The fine-tuned Qwen3-8B-based system reported higher predictive accuracy than GPT-5 mini, Llama-3.1-8B, and its own zero-shot counterpart. Overall, when tested on over 3,500 annotations, GOFinder-AI achieved a cumulative accuracy of 95.32%. It completed document-level GO curation in under one minute on average. GOFinder-AI offers a scalable, interpretable, and transparent approach to automated GO curation.
Description
Keywords
Gene Ontology (GO), Large language models (LLMs), Biomedical text mining, Fine-tuning, Curation
