Multi-Objective Optimization to Improve Structure-Based Virtual Screening at Large Scale

Skorupan, Stasa

Multi-Objective Optimization to Improve Structure-Based Virtual Screening at Large Scale

dc.contributor.author	Skorupan, Stasa
dc.contributor.supervisor	Gentile, Francesco
dc.date.accessioned	2026-05-21T19:44:27Z
dc.date.available	2026-05-21T19:44:27Z
dc.date.issued	2026-05-21
dc.description.abstract	The recent expansion of commercial, chemical databases to billions of make-on demand molecules has restructured hit identification methods in early-stage drug discovery. The exploration of these ultra-large chemical databases represents a new frontier for the identification of novel, potent, and selective drug candidates. Molecular docking is a computational technique used to predict the binding modes and affinities of small molecules to a protein binding site, enabling fast and cost-effective virtual screening (VS) of large chemical libraries to identify novel hit compounds. Machine learning (ML)-augmented molecular docking methods were proposed as a new paradigm to screen ultra-large, commercial, chemical libraries and expand VS to billions of molecules. These models are generally trained on a small subset of docked protein- ligand complexes to predict binding affinities of the remainder of the dataset, thereby streamlining the identification of promising top-scoring candidates at reasonable computational costs. Molecular docking continues to be crucial in the acceleration and cost-effectiveness of hit identification. However, the simplified modelling of protein-ligand interactions introduces a significant number of artifact molecules, especially for ultra-large libraries, with the visual inspection of the top-ranked docking hits being the standard protocol to remove these artifacts. Common criteria considered in the assessment of modelled protein-ligand binding includes strain and unsatisfied polar ligand or protein heteroatoms and thus, various computational tools have been developed to automate this filtering process at scale. Moreover, ML-accelerated docking models can implicitly learn and propagate these inherent molecular docking artifacts. Importantly, no current model addresses the effect of artifacts inherent to molecular docking predictions performed at large scale. The research herein developed new multi-objective optimization (MOO), ML-accelerated molecular docking models to aid in artifact filtering and selection of promising candidates in the early stages of drug discovery. These models incorporate selected three-dimensional medicinal chemistry properties that were thoroughly evaluated for their potential to improve early enrichment in VS. Chapter 2 presents a retrospective evaluation of ligand strain and unsatisfied hydrogen bonds as filters in the post-processing of molecular dockings to assess the impact on early enrichment. We found their effect on enrichment to be highly system-dependent: there was no single threshold that led to an enrichment of all protein-ligand datasets explored, but several proteins showed significant enrichment when using strain, unsatisfied hydrogen bonds, or both as filters at specific thresholds. Chapter 3 presents simulated large-scale prospective VS campaigns with the developed MOO ML-accelerated molecular docking models and selected three- dimensional medicinal chemistry filters. Multi-task learning (MTL) was explored to this end. In most systems studied, there was an improvement in early enrichment with filtering compared to molecular docking score predictions alone. MTL models showed the potential to improve early enrichment in large-scale VS, while accelerating runtime (3-4x) and significantly reducing (90- 99%) the chemical database size. This research contributes an open-source, medicinal chemistry- informed ML-accelerated molecular docking model towards the development of new drug discovery tools.
dc.identifier.uri	http://hdl.handle.net/10393/51693
dc.identifier.uri	https://doi.org/10.20381/ruor-31985
dc.language.iso	en
dc.publisher	Université d'Ottawa \| University of Ottawa
dc.subject	Machine Learning
dc.subject	Computational Drug Discovery
dc.subject	Molecular Docking
dc.subject	Multi-Task Learning
dc.subject	Structure-Based Virtual Screening
dc.title	Multi-Objective Optimization to Improve Structure-Based Virtual Screening at Large Scale
dc.type	Thesis	en
thesis.degree.discipline	Sciences / Science
thesis.degree.level	Masters
thesis.degree.name	MSc
uottawa.department	Chimie et sciences biomoléculaires / Chemistry and Biomolecular Sciences

Fichiers

Trousse originale

Voici les éléments 1 - 1 sur 1

Nom:: Skorupan_Stasa_2026_thesis.pdf
Taille:: 11.24 MB
Format:: Adobe Portable Document Format

Télécharger

Trousse de licence

Voici les éléments 1 - 1 sur 1

Nom:: license.txt
Taille:: 2.51 KB
Format:: Item-specific license agreed upon to submission
Description:

Télécharger

Collections

- Thèses, 2011 - // Theses, 2011 -