GOFinder-AI: Rapid and Explainable Gene Ontology Term Assignment Using Large Language Models
| dc.contributor.author | Almir Ahmad, Aws | |
| dc.contributor.supervisor | Mer, Arvind | |
| dc.date.accessioned | 2026-05-13T21:43:41Z | |
| dc.date.available | 2026-05-13T21:43:41Z | |
| dc.date.issued | 2026-05-13 | |
| dc.description.abstract | Gene Ontology (GO) provides a structured vocabulary for describing the function of gene products. However, the rapid growth of biomedical literature makes manual GO curation increasingly difficult to sustain. Here, we present GOFinder-AI, a computational framework that supports literature-grounded GO annotation through pre-query text mining and large language model (LLM) inference. Given a biomedical text, the system identifies candidate GO annotations and produces supporting citations, explanatory reasoning, and linked biological entities. To improve task-specific performance, we fine-tuned multiple general-purpose LLMs (Llama-3.1-8B and Qwen3-8B) on a large, annotated dataset with more than 23,000 examples. Model performance was assessed using grouped 4-fold cross-validation, followed by evaluation on an independent test set containing >7000 gene-GO associations. Fine-tuning markedly improved performance compared to zero-shot prompting. The fine-tuned Qwen3-8B-based system reported higher predictive accuracy than GPT-5 mini, Llama-3.1-8B, and its own zero-shot counterpart. Overall, when tested on over 3,500 annotations, GOFinder-AI achieved a cumulative accuracy of 95.32%. It completed document-level GO curation in under one minute on average. GOFinder-AI offers a scalable, interpretable, and transparent approach to automated GO curation. | |
| dc.identifier.uri | http://hdl.handle.net/10393/51653 | |
| dc.identifier.uri | https://doi.org/10.20381/ruor-31951 | |
| dc.language.iso | en | |
| dc.publisher | Université d'Ottawa / University of Ottawa | |
| dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | en |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | |
| dc.subject | Gene Ontology (GO) | |
| dc.subject | Large language models (LLMs) | |
| dc.subject | Biomedical text mining | |
| dc.subject | Fine-tuning | |
| dc.subject | Curation | |
| dc.title | GOFinder-AI: Rapid and Explainable Gene Ontology Term Assignment Using Large Language Models | |
| dc.type | Thesis | en |
| thesis.degree.discipline | Médecine / Medicine | |
| thesis.degree.level | Masters | |
| thesis.degree.name | MSc | |
| uottawa.department | Biochimie, microbiologie et immunologie / Biochemistry, Microbiology and Immunology |
Files
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 2.51 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
