Repository logo

GOFinder-AI: Rapid and Explainable Gene Ontology Term Assignment Using Large Language Models

dc.contributor.authorAlmir Ahmad, Aws
dc.contributor.supervisorMer, Arvind
dc.date.accessioned2026-05-13T21:43:41Z
dc.date.available2026-05-13T21:43:41Z
dc.date.issued2026-05-13
dc.description.abstractGene Ontology (GO) provides a structured vocabulary for describing the function of gene products. However, the rapid growth of biomedical literature makes manual GO curation increasingly difficult to sustain. Here, we present GOFinder-AI, a computational framework that supports literature-grounded GO annotation through pre-query text mining and large language model (LLM) inference. Given a biomedical text, the system identifies candidate GO annotations and produces supporting citations, explanatory reasoning, and linked biological entities. To improve task-specific performance, we fine-tuned multiple general-purpose LLMs (Llama-3.1-8B and Qwen3-8B) on a large, annotated dataset with more than 23,000 examples. Model performance was assessed using grouped 4-fold cross-validation, followed by evaluation on an independent test set containing >7000 gene-GO associations. Fine-tuning markedly improved performance compared to zero-shot prompting. The fine-tuned Qwen3-8B-based system reported higher predictive accuracy than GPT-5 mini, Llama-3.1-8B, and its own zero-shot counterpart. Overall, when tested on over 3,500 annotations, GOFinder-AI achieved a cumulative accuracy of 95.32%. It completed document-level GO curation in under one minute on average. GOFinder-AI offers a scalable, interpretable, and transparent approach to automated GO curation.
dc.identifier.urihttp://hdl.handle.net/10393/51653
dc.identifier.urihttps://doi.org/10.20381/ruor-31951
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectGene Ontology (GO)
dc.subjectLarge language models (LLMs)
dc.subjectBiomedical text mining
dc.subjectFine-tuning
dc.subjectCuration
dc.titleGOFinder-AI: Rapid and Explainable Gene Ontology Term Assignment Using Large Language Models
dc.typeThesisen
thesis.degree.disciplineMédecine / Medicine
thesis.degree.levelMasters
thesis.degree.nameMSc
uottawa.departmentBiochimie, microbiologie et immunologie / Biochemistry, Microbiology and Immunology

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail ImageThumbnail Image
Name:
Almir_Ahmad_Aws_2026_thesis.pdf
Size:
2.23 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail ImageThumbnail Image
Name:
Almir_Ahmad_Aws_2026_video.mp4
Size:
2.13 MB
Format:
MP4 Container format for video files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
2.51 KB
Format:
Item-specific license agreed upon to submission
Description: