Deep Learning for Protein-Protein Interaction Prediction and Protein Design

Wu, Junzheng

Deep Learning for Protein-Protein Interaction Prediction and Protein Design

dc.contributor.author	Wu, Junzheng
dc.contributor.supervisor	Viktor, Herna
dc.contributor.supervisor	Paquet, Eric
dc.contributor.supervisor	Michalowski, Wojtek
dc.date.accessioned	2023-03-23T15:55:24Z
dc.date.available	2023-03-23T15:55:24Z
dc.date.issued	2023-03-23	en_US
dc.description.abstract	Protein–protein interactions (PPI) play a fundamental role in many biochemical functions such as signal transduction, cellular organization, and cell cycle progression. Laboratory experiments are expensive and time-consuming, limiting their applicability to a handful of cases, which severely impacts the speed of research innovation. Meanwhile, computational experiments can explore and screen a vast number of possibilities leaving the researchers with the most promising cases. For this reason, there is an urgent need to develop computational solutions for accurately determining whether proteins interact. Specifically, developing deep learning solutions to predict PPI constitutes an emerging area of research with much practical application. This thesis focuses on advancing the research in this area. In our first contribution, the thesis introduces a novel deep learning algorithm for predicting PPI solely from their amino acid sequences. The novelty of our work is that we consider self-binding and folding properties rather than only focusing on the sequences per se, as is commonplace in the state of the art. Our Siamese pyramid network (SPNet) architecture comprises a multilevel Siamese neural network with an attention mechanism and a trainable probability prediction network. Our experimental evaluation indicates that SPNet outperforms the state of the art against strict data sets, that is, data sets with no data leakage, leading to accurate and timely predictions. Our second contribution centers on the observation that the sizes and compositions of PPI data sets vary considerably, which may impact the learnability for deep learning solutions. For instance, small data sets are commonplace for rare and new diseases, while current deep learning implementations usually rely on large PPI data sets to produce accurate predictions. To this end, we introduce an adaptive and learnable pyramid network architecture in which the depth and the complexity of the network are directly learned from the data. Our experimental evaluation shows that these characteristics make our solution suitable for both small and large PPI data sets, varying in sizes from a few thousand to seventeen million pairs. In addition, the number of negative (non-interacting) protein pairs is generally scarce, resulting in highly imbalanced data sets that may impede neural network training. Therefore, our third contribution centers on a novel methodology, based on graphs and geodesic distances, that extracts non-interacting proteins from the graph associated with their interactions and binding strength. A new balanced B-STRING data set is created that consists of 17,591,832 protein pairs. Our comparative experimental evaluation not only confirms the value of our learnable pyramid network architecture but also shows the value of providing a novel benchmark data set for future use by the community. In our final contribution, we introduce our architecture for designing protein binders based on primary sequences through a transformer deep learning model that adopts the mechanism of attention, differentially weighing the significance of each part of the input sequence. Predicting protein binders is a multivalued problem, which implies that there is more than one solution, since there may be more than one binder for a specific protein. To solve a such a learning task, we utilize two types of constraints in our deep learning solution. These are based, respectively, on the binding score which is related to the strength of interactions, and the Bayesian prior, where we assume that a small portion of the ligands’ amino acids are known. Our experimental evaluation confirms the strengths of this novel approach.	en_US
dc.identifier.uri	http://hdl.handle.net/10393/44730
dc.language.iso	en	en_US
dc.publisher	Université d'Ottawa / University of Ottawa	en_US
dc.rights	Attribution-NonCommercial 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	*
dc.subject	Deep Learning	en_US
dc.subject	Protein-protein Interaction	en_US
dc.subject	Protein Design	en_US
dc.title	Deep Learning for Protein-Protein Interaction Prediction and Protein Design	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Sciences / Science	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	PhD	en_US
uottawa.department	Science informatique et génie électrique / Electrical Engineering and Computer Science	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wu_Junzheng_2023_thesis.pdf
Size:: 8.06 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.65 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

- Thèses, 2011 - // Theses, 2011 -