Repository logo

Multi-Objective Optimization to Improve Structure-Based Virtual Screening at Large Scale

dc.contributor.authorSkorupan, Stasa
dc.contributor.supervisorGentile, Francesco
dc.date.accessioned2026-05-21T19:44:27Z
dc.date.available2026-05-21T19:44:27Z
dc.date.issued2026-05-21
dc.description.abstractThe recent expansion of commercial, chemical databases to billions of make-on demand molecules has restructured hit identification methods in early-stage drug discovery. The exploration of these ultra-large chemical databases represents a new frontier for the identification of novel, potent, and selective drug candidates. Molecular docking is a computational technique used to predict the binding modes and affinities of small molecules to a protein binding site, enabling fast and cost-effective virtual screening (VS) of large chemical libraries to identify novel hit compounds. Machine learning (ML)-augmented molecular docking methods were proposed as a new paradigm to screen ultra-large, commercial, chemical libraries and expand VS to billions of molecules. These models are generally trained on a small subset of docked protein- ligand complexes to predict binding affinities of the remainder of the dataset, thereby streamlining the identification of promising top-scoring candidates at reasonable computational costs. Molecular docking continues to be crucial in the acceleration and cost-effectiveness of hit identification. However, the simplified modelling of protein-ligand interactions introduces a significant number of artifact molecules, especially for ultra-large libraries, with the visual inspection of the top-ranked docking hits being the standard protocol to remove these artifacts. Common criteria considered in the assessment of modelled protein-ligand binding includes strain and unsatisfied polar ligand or protein heteroatoms and thus, various computational tools have been developed to automate this filtering process at scale. Moreover, ML-accelerated docking models can implicitly learn and propagate these inherent molecular docking artifacts. Importantly, no current model addresses the effect of artifacts inherent to molecular docking predictions performed at large scale. The research herein developed new multi-objective optimization (MOO), ML-accelerated molecular docking models to aid in artifact filtering and selection of promising candidates in the early stages of drug discovery. These models incorporate selected three-dimensional medicinal chemistry properties that were thoroughly evaluated for their potential to improve early enrichment in VS. Chapter 2 presents a retrospective evaluation of ligand strain and unsatisfied hydrogen bonds as filters in the post-processing of molecular dockings to assess the impact on early enrichment. We found their effect on enrichment to be highly system-dependent: there was no single threshold that led to an enrichment of all protein-ligand datasets explored, but several proteins showed significant enrichment when using strain, unsatisfied hydrogen bonds, or both as filters at specific thresholds. Chapter 3 presents simulated large-scale prospective VS campaigns with the developed MOO ML-accelerated molecular docking models and selected three- dimensional medicinal chemistry filters. Multi-task learning (MTL) was explored to this end. In most systems studied, there was an improvement in early enrichment with filtering compared to molecular docking score predictions alone. MTL models showed the potential to improve early enrichment in large-scale VS, while accelerating runtime (3-4x) and significantly reducing (90- 99%) the chemical database size. This research contributes an open-source, medicinal chemistry- informed ML-accelerated molecular docking model towards the development of new drug discovery tools.
dc.identifier.urihttp://hdl.handle.net/10393/51693
dc.identifier.urihttps://doi.org/10.20381/ruor-31985
dc.language.isoen
dc.publisherUniversité d'Ottawa | University of Ottawa
dc.subjectMachine Learning
dc.subjectComputational Drug Discovery
dc.subjectMolecular Docking
dc.subjectMulti-Task Learning
dc.subjectStructure-Based Virtual Screening
dc.titleMulti-Objective Optimization to Improve Structure-Based Virtual Screening at Large Scale
dc.typeThesisen
thesis.degree.disciplineSciences / Science
thesis.degree.levelMasters
thesis.degree.nameMSc
uottawa.departmentChimie et sciences biomoléculaires / Chemistry and Biomolecular Sciences

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Skorupan_Stasa_2026_thesis.pdf
Size:
11.24 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
2.51 KB
Format:
Item-specific license agreed upon to submission
Description: