Machine Learning Scoring Functions to Improve Molecular Docking Against Protein-Protein Interaction Targets

Park, Sumin

Machine Learning Scoring Functions to Improve Molecular Docking Against Protein-Protein Interaction Targets

Files

Park_Sumin_2025_thesis.pdf (3.09 MB)

Date

2025-08-22

Authors

Park, Sumin

Publisher

Université d'Ottawa | University of Ottawa

Creative Commons

Abstract

Identification of novel therapeutic agents to modulate disease-specific protein targets has been a successful strategy in modern-day drug discovery. While classical targets are receptors, enzymes, and ion channels, protein-protein interaction (PPI) targets are gaining popularity in recent years. PPIs regulate cellular mechanisms associated with vital life processes including signal transduction, cell proliferation, growth, differentiation, and apoptosis. While there are more than 650,000 reported PPIs in the human interactome, only a small fraction of them have been targeted and developed into clinically available drugs. The scarcity of PPIs as biological targets in the drug market derives from significant challenges posed by the structural and topological characteristics of PPI interfaces, which are expansive, flat, and hydrophobic compared to well-defined pockets of conventional binding sites. To overcome these challenges, computational methods such as structure-based virtual screening (SBVS) have been applied to accelerate the discovery of small-molecule PPI modulators. SBVS utilizes molecular docking simulations to estimate binding affinity and screens large compound libraries to identify virtual hits. In the last decade, scoring functions (SFs), a major component of docking that evaluates the binding energy and pose of a given ligand, have made the transition from being physics-based to machine learning (ML)-based. Numerous studies indicate that machine learning scoring functions (MLSFs) perform better or at least comparably to physics-based SFs, driving the development of a wide variety of MLSFs over the past decade. In this work, we present new benchmarking datasets and MLSFs tailored PPI targets, designed to improve pose selection in molecular docking. To train and evaluate MLSFs for the drug discovery of PPI targets, we constructed a database consisting of PPI inhibitor poses docked into binding pockets via re-docking and cross-docking with AutoDock and GNINA. Benchmarking this database for the docking power—the ability to identify near-native binding poses—revealed significant room for improvement in pose prediction. The PPI databases were used to train and cross-validate ML models using a variety of interaction-based 3D features, and architectures ranging from shallow models to graph neural networks (GNNs). Our best performing GNN models outperformed two state-of-the-art MLSFs, GNINA and PIGNet2, demonstrating the effectiveness of utilizing interaction features (rather than atomic or molecular-level descriptors) and graph architectures on non-biased datasets. Our work enables fair evaluation of MLSFs using diverse, realistic docking scenarios and introduces a novel computational strategy for identifying small-molecule PPI inhibitors through virtual screening, paving the way for prospective pharmacological investigations of these challenging targets.

Keywords

Machine Learning, Computational Drug Discovery, Molecular Docking, Scoring Function, Protein-Protein Interaction Target

URI

http://hdl.handle.net/10393/50787
https://doi.org/10.20381/ruor-31339

Collections

- Thèses, 2011 - // Theses, 2011 -

Full item page Statistics

Machine Learning Scoring Functions to Improve Molecular Docking Against Protein-Protein Interaction Targets

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Creative Commons

Abstract

Description

Keywords

Citation

URI

Collections

Related Materials

Alternate Version